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ABSTRACT 


This  paper  gives  an  overview  of  those  aspects  of  simulation  methodology  that  are 
(to  some  extent)  peculiar  to  the  simulation  of  queueing  systems.  A  generalized  semi- 
Markov  process  framework  for  describing  queueing  systems  is  used  through  much  of  the 
paper.  The  main  topics  covered  are:  output  analysis  for  simulation  of  transient  and  steady- 
state  quantities,  variance  reduction  methods  that  exploit  queueing  structure,  and  gradient 
estimation  methods  for  performance  parameters  associated  with  queueing  networks. 


KEYWORDS:  Simulation,  queueing  theory,  output  analysis,  variance  reduction,  general¬ 
ized  semi-Markov  processes,  gradient  estimation. 


1.  INTRODUCTION 


This  paper  is  intended  to  give  the  reader  an  overview  of  those  aspects  of  simulation 
methodology  that  exploit  (to  some  degree)  the  stochastic  structure  of  queueing  systems. 
As  a  consequence,  certain  more  broadly  based  methodologies  are  not  discussed  here;  see 
Chapters  2,  3,  and  8  of  BRATLEY,  FOX,  and  SCHRAGE  (1987)  for  a  more  complete 
picture  of  the  current  state  of  simulation  methodology. 

In  Section  2,  we  argue  that  simulation  is  an  important  numerical  tool  for  the  study  of 
complex  queueing  systems.  Section  3  develops  the  generalized  semi-Markov  process  view  of 
queueing  systems,  which  runs  as  a  unifying  thread  through  much  of  this  paper.  In  Section 
4,  we  describe  the  basic  methodology  for  the  analysis  of  simulation  output  associated  with 
estimation  of  transient  quantities.  Section  5  describes  the  analogues  of  these  methods  for 
the  steady-state  estimation  problem.  The  key  role  of  general  state  space  Markov  chain 
theory  in  the  analysis  of  steady-state  output  is  also  described  there. 

Section  6  describes  some  recent  work  on  exploiting  diffusion  approximations  for  queues 
in  heavy  traffic  to  develop  insight  into  the  question  of  how  the  simulation  run-length  re¬ 
quired  to  estimate  steady-state  quantities  increases  as  the  traffic  intensity  converges  to 
unity.  In  Section  7,  the  ideas  of  variance  reduction  and  efficiency  improvement  axe  in¬ 
troduced.  These  methods  permit  the  simulator  to  improve  upon  conventional  estimation 
methods  by  incorporating  knowledge  of  the  stochastic  system  under  consideration  into  the 
estimator.  Sections  8  through  14  describe  seven  such  techniques  that  exploit  queueing- 
related  stochastic  structure.  Finally,  in  Section  15  two  powerful  new  methods  (based  on 
perturbation  analysis  and  likelihood  ratios)  for  estimating  gradients  of  queueing  perfor¬ 
mance  parameters  are  discussed. 

2.  SIMULATION  AS  A  NUMERICAL  TOOL 

Simulation  is  a  powerful  tool  for  studying  complex  queueing  systems.  Its  popularity 
derives  from  the  following  factors: 

1)  The  basic  idea  underlying  simulation  is  conceptually  simple  to  understand.  As  a  result, 
it  is  a  tool  that  is  accessible  to  a  wide  range  of  users,  including  those  without  a  strong 
background  in  the  theory  of  stochastic  processes. 

2)  By  using  simulation  in  conjunction  with  sophisticated  graphics,  one  can  observe  the 
evolution  of  sample  trajectories  of  the  system.  This  can  provide  an  insight  into  the 


system  that  is  unavailable  from  a  purely  numerical  study,  in  which  the  focus  would  be 
on  computation  of  expectations  of  various  performance  measures. 

3)  Conventional  numerical  approaches  often  rely  on  solving  specifically  structured  systems 
of  equations.  The  appropriate  system  of  equations  differs  from  one  type  of  expectation 
to  another.  For  example,  the  method  used  to  solve  for  the  steady-state  of  a  Markov 
chain  is  quite  different  from  that  used  to  calculate  the  transient  probabilities.  On 
the  other  hand,  simulation  offers  a  methodology  which  can  calculate  expectations  of 
arbitrary  functionals  of  the  system,  without  any  major  change  in  the  basic  approach. 

4)  Conventional  numerical  techniques  are  generally  oriented  to  the  solution  of  stochastic 
systems  having  rather  special  probabilistic  structure  (e.g.  discrete-time  and  continuous¬ 
time  Markov  chains).  By  contrast,  simulation  is  a  general-purpose  tool  for  which  the 
study  of  a  general  discrete-event  system  requires  a  programming  effort  comparable  to 
its  Markovian  counterpart  (in  which,  for  example,  all  inter-arrival  and  service  time 
distributions  are  assumed  exponential). 

As  the  above  points  indicate,  a  principal  advantage  of  Monte  Carlo  simulation  is  that 
implementation  of  such  algorithms  by  practitioners  is  relatively  straightforward,  thereby 
saving  time  and  (possibly)  money.  In  addition,  there  are  often  sound  computational  rea¬ 
sons  for  choosing  a  simulation  algorithm  over  competing  numerical  approaches.  We  will 
illustrate  this  point  with  an  example. 

Suppose  that  one  is  interested  in  the  transient  behavior  of  an  irreducible  closed  Jackson 
network,  containing  n  customers,  with  a<  servers  at  the  *’th  station,  1  <  *  <  d.  The  state 
space  5  of  the  corresponding  continuous-time  Markov  chain  X  =  (X(t)  :  t  >  0)  is  the  set  of 
d-tuples  {(fci,..., Jfed) :  kitX+,ki  + ...  +  kd  =  n),  where  Z+  is  the  set  of  non-negative  integers. 
Note  that 

so  that  the  size  of  the  state  space  S  grows  rapidly  with  n. 

If  one  is  interested  in  calculating  a  =  En/(X(t))  for  /  :  JRd  -»  R.  (En()  denotes  the 
expectation  operator  conditional  on  X{0)  having  distribution  rj),  a  conventional  numerical 
approach  would  typically  rely  on  the  fact  that 

(2.1)  a  =  rf  exp(At)/ 

where  A  is  the  generator  of  X.  (We  assume  throughout  this  paper  that  all  vectors  are 
encoded  as  column  vectors.)  It  is  clear  that  the  complexity  of  a  conventional  numerical 
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algorithm  for  calculating  (2.1)  will  increase  (rapidly)  with  n,  since  such  algorithms  are 
highly  sensitive  to  |$|. 

For  example,  it  is  weil  known  that  the  right-hand  side  of  (2.1)  can  be  expressed  as 

vt'xP(At)f  =  n'  f) 

msO 

One  can  recursively  compute  the  products  gm  =  Amtmf/ml  via  gm  =  Agm-i  t/m ;  each  of  these 
matrix-vector  multiplications  takes  on  the  order  of  |S|2  operations.  This  analysis  suggests 
that  the  complexity  of  this  matrix-vector  multiplication  algorithm  increases  rapidly  with 
n. 

To  be  more  precise,  assume  ||/||  <  1  where  ||/||  =  max{|/(*)|  :  xeS}.  If  p1,...,n<J  are  the 
service  rates  at  the  d  stations,  note  that 

Mil  =  \Axy | :  xeS ) 

»*« 

d 

>=i 

is  independent  of  n.  Thus,  for  a  given  error  tolerance  «,  one  can  select  M(e)  (independently 
of  n)  such  that 

|  £  *j‘Amtm//rn!|  < ». 

Hence,  the  complexity  to  calculate  EMf(X(t))  to  precision  «,  via  the  above  algorithm,  for 
/’ s  satisfying  ||/||  <  1,  is  the  number  of  operations  required  to  calculate 

n*  Y, 

which  is,  of  course,  of  order  nM-a.  (Recall  that  |S|  is  of  order  n4”1).  Thus,  for  a  system 
having  a  large  number  of  customers  n  (or  stations  d),  this  approach  becomes  infeasible. 

On  the  other  hand,  Monte  Carlo  simulation  is  relatively  insensitive  to  the  size  of  the 
state  space.  Let  Xx, X3l...  be  i.i.d.  copies  of  the  process  X  and  consider  the  estimator 


“"“E/W))- 


If  || /||  <  1)  Chebyshev’s  inequality  implies  that 


P{|am-a|>*}<  -5—. 

£  171 


Thus,  the  sample  size  m  required  to  obtain  c-precision  with  probability  1-5  is  independent 
of  |S|.  Furthermore,  we  claim  that  the  number  of  observations  needed  to  generate  am 
is  insensitive  to  |$|.  To  see  this,  note  that  X  can  be  uniformized  via  a  Poisson  process 
N[  )  having  rate  E^>»y  (|a.«|  <  E/i,*,  for  all  xtS).  Since  the  transition  epochs  of  X  form  a 
subsequence  of  the  jump  times  of  N,  it  follows  that  the  number  of  jumps  of  X  up  to  t  is 
dominated  by  JV(t).  But  EN(t)  -  •**  which  is  independent  of  n.  We  conclude  that 

Monte  Carlo  simulation’s  efficiency  is  relatively  insensitive  to  the  magnitude  of  n. 

The  above  analysis  suggests  the  following  rule  of  thumb.  As  the  state  space  of  a 
queueing  system  gets  larger  and  larger,  simulation  becomes  increasingly  more  competitive 
as  a  computational  tool. 

An  additional  attractive  feature  of  Monte  Carlo  simulation,  which  is  important  compu¬ 
tationally,  is  that  the  error  analysis  for  Monte  Carlo  algorithms  is  generally  straightforward. 
For  example,  the  typical  Monte  Carlo  approach  to  calculating  a  parameter  a  which  can  be 
expressed  as  the  mean  of  a  r.v.  X  (i.e.,  a  =  EX)  is  to  generate  i.i.d.  replicates  Xi,X3,...  of 
the  r.v.  X  and  to  use  X(n)  =  n-1  as  an  estimator  of  a.  The  error  in  X(n)  is  usually 

assessed  via  a  confidence  interval.  In  this  setting,  if  EX?  <  oo,  a  simulator  can  simply  use 
the  interval 

(2.2)  *(„)+, 

as  a  100(1-5)%  confidence  interval  for  a.  (In  (2.2),  *(5)  is  the  root  of  the  equation  /’{JV(0,  l)  < 
*(5)}  =  1-5/2  and  *(n)  is  the  sample  standard  deviation.)  Since  the  central  limit  theorem 
justifies  the  asymptotic  validity  of  (2.2)  as  a  100(1  -  6)%  confidence  interval  for  a,  we  see 
that  (2.2)  provides  an  easily  calculated  and  asymptotically  precise  error  assessment  for  the 
estimator  ^(n).  Error  assessments  (even  in  the  form  of  upper  bounds  on  the  error)  are 
typically  quite  hard  to  calculate  for  conventional  numerical  schemes. 

3.  THE  ROLE  OF  GSMP’S  IN  QUEUEING  SIMULATIONS 

In  order  to  describe  the  simulation  of  queueing  systems,  we  shall  find  it  convenient 
to  use  the  formalism  of  generalized  semi-Markov  processes  (GSMP’s).  GSMP’s  form  a 
class  of  stochastic  processes  that  succinctly  describe  the  essential  probabilistic  features  of 
queueing  systems.  A  GSMP  is  characterized  by: 

(3.1)  S:  a  “physical"  state  space  which  is  finite  or  countable  (typically, 

5  will  be  the  set  of  all  possible  queue-length  vectors) 


LTKT 


E(s): 


r.*: 


F(-;  *',*',«,«) : 


the  set  of  events  which  can  occur  in  aeS  (E{»)  will  usually  corres¬ 
pond  to  the  different  arrival  and  departure  events  possible  when 
the  system  is  in  state  a) 

the  probability  of  jumping  from  a  to  given  that  event  «  triggers 
the  transition  from  a  (e.g.  «  might  correspond  to  station  *  com¬ 
pleting  service,  in  which  case  p(a';a,e)  might  represent  the  proba¬ 
bility  of  sending  a  customer  from  station  *  to  station  j;  here  *'  = 
w-ei  +  where  ti  is  the  *’th  unit  vector 

the  rate  at  which  the  clock  corresponding  to  event  e  runs  down  to 
zero  in  state  a  (e.g.  in  a  queueing  network,  r„  might  be  unity  ex¬ 
cept  for  events  e  which  are  “interrupted”  in  state  a,  in  which  case 
r,t  =  0;  such  “interrupted”  events  occur  in  the  modeling  of  pre¬ 
emptive  resume  queueing  priorities) 

the  probability  distribution  which  schedules  a  new  event  e'  in 
state  a',  given  that  the  previous  state  was  a  and  the  transition 
was  triggered  by  e  (e.g.  these  would  typically  be  service  and  inter- 
arrival  time  distributions  in  a  queueing  network). 


We  will  now  illustrate  the  GSMP  modeling  formalism  by  describing  a  general  service 
time/ inter-arrival  time  version  of  an  open  Jackson  network.  We  assume  that  the  net¬ 
work  has  d  stations,  each  station  containing  one  server.  Specifically,  each  server  uses  a  first 
come/first  serve  queueing  discipline;  the  service  times  for  the  consecutive  customers  served 
at  the  t’th  station  are  i.i.d.  with  common  continuous  distribution  G,.  The  external  arrival 
stream  to  the  *’th  station  is  assumed  to  be  a  renewal  process  with  continuous  inter-arrival 
distribution  Fi.  Furthermore,  customers  are  routed  between  stations  according  to  a  Marko¬ 
vian  routing  scheme  with  associated  substochastic  routing  matrix  P  =  (Pt,  :  1  <  i,j  <  d). 
Finally,  the  routing,  inter-arrival,  and  service  time  sequences  described  above  are  all  as¬ 
sumed  independent  of  one  another. 

Given  such  a  network,  we  obtain  a  GSMP  by  letting  5  =  Z+  x  ...  x  Z+  (d  times);  the 
vector  a  =  («!,... ,a4)eS  will  then  represent  the  queue-lengths  (including  the  customer  at 
the  server)  at  each  of  the  d  stations.  A  state  transition  occurs  via  either  of  the  following 
possibilities:  an  external  arrival  event  or  a  departure  event.  Thus,  E[a)  =  {(»,  1)  :  1  < 
»  <  d}  u  {(»,  2)  :  1  <  *  <  d,  at  >  1),  where  (*,  l)  corresponds  to  an  external  arrival  to  station  « 


6 


and  (j,2)  denotes  a  departure  from  station  j.  Note  that  the  continuity  of  the  F<’s  and  G/s 
implies  that  simultaneous  events  can  not  occur,  in  the  sense  that  simultaneous  external 
arrivals  and  departures  are  impossible.  As  for  the  routing  probabilities  p(«';a,e),  observe 
that: 


(*,!))={ 
M))  =  { 


1  if  s'  =  8  +  ei 
0  if  s'  ^  a  +  «< 

Pn  ]£»'  =  »  + 

i-E Up<>  **'  =  «- 


tj  -  e,-, 


«.  > 
1 


1 


All  “speeds”  r„  =  1  and  l),«,e)  =  F,(),  whereas  F(;a',(i,  2),a,e)  =  <?,(). 

To  precisely  define  the  dynamics  of  a  GSMP,  we  use  an  associated  GSSMC  (general 
state  space  Markov  chain).  To  simplify  the  remainder  of  our  discussion  of  GSMP’s,  we 
henceforth  assume  that  the  distributions  F(;a‘,  are  continuous  with  F(0,a',e',a,e)  =0; 
this  permits  us  to  ignore  the  possibility  of  simultaneous  arrivals  and  departures.  We  also 
require  that  r„  >  0  for  some  ««£(«);  otherwise,  the  system  gets  “stuck”  in  state  a.  The  key 
to  the  dynamics  of  a  GSMP  is  to  identify  a  clock  with  each  of  the  events  eeE(a)  that  are 
“active”  in  state  aeS.  In  a  queueing  context,  the  clock  readings  will  typically  correspond  to 
the  amounts  of  time  remaining  until  the  next  arrival  and  departure  events  occur,  for  each 
of  the  arrival  and  service  time  processes  active  in  *.  A  GSSMC  is  obtained  by  applying  the 
method  of  supplementary  variables  to  the  GSMP;  the  idea  is  that  the  GSSMC  describes 
not  only  the  physical  state  of  the  system,  but  also  the  states  of  the  clocks  for  each  of  the 
currently  active  events.  Thus,  the  GSSMC  X  =  (Jfn  :  n  >  0)  will  take  the  form  Xn  =  (5n,C„), 
where  Sn  is  the  physical  state  at  the  n’th  transition  of  the  GSMP,  and  Cn  is  the  associated 
vector  of  clock  readings. 

To  describe  the  state  space  of  X,  let  R+  =  [0,oo),  E  =  U,«s  £(*)>  and  set 


C.  ={celR®  :  c,  >  0  iff  t€E{a), 


e,r„  ^  ee<r„.  for  e  ^  e'  whenever 


c,r„c,.r„.  >  0}. 

Then,  E  =  U.«s({*}  x  <?•)  is  the  state  space  for  the  chain  X.  For  (*,c)eE,  let 

t*  s  =  min{ce/r,e  :  «2?(»)}, 

cl  =  CJ(«.  «)=«€-  t*r„,  eeE(a), 

e*  =  e*(«,e)  =  e  iff  e*  =  0  and  etE(a). 

Note  that  <*  is  the  (unique)  event  triggering  a  transition  from  state  #,  while  t*  is  the 
interval  between  transitions  of  the  GSMP,  beginning  in  state  *  with  clock  vector  e.  At  a 
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transition  from  «  to  •'  triggered  by  event  «,  new  clock  values  are  independently  generated  for 
each  c'eN(a‘,  a,e)  =  E(a')  -  (E(a)  -  {«});  the  clock  values  are  generated  from  the  distributions 
F(-,  a',  «',*,«).  For  e'eO(a',a,e)  s  E(a')  n  (£(*)  -  {«}),  the  old  clock  reading  is  kept  after  the 
transition  so  that  c,<  =  c;,(*,c).  Finally,  for  e'e(E(a)  -  {«})  -  £(«'),  event  e'  ceases  to  be 
scheduled  after  the  transition  so  that  e«<  =  o;  this  occurs  (for  a  departure  event  clock) 
whenever  a  customer  departs  to  leave  the  server  idle,  for  example. 

We  are  now  ready  to  define  the  transition  function  of  X.  For  («,e)e£,  A  =  {«'}  x  {e'eC,« : 
c't  <  o*i  e*E(a')}t  set 

(3.2)  P((a,c),A)=p(»'ia,e-)  II  4o,«.|(‘.*) 

Let  X  =  (Xn  =  (Sn,Cn)  :  n  >  0)  be  the  co-ordinate  process  having  transition  function  (3.2). 
Then,  for  n  >  1, 

n— 1 

(3.3)  A(n)  =  ^t*(5*,Cfc) 

*=o 

is  the  time  of  the  n’th  transition  of  the  GSMP.  Thus,  the  GSMP  Q  -  (Q(t) :  t  >  0)  may  be 
formally  defined  as 

OO 

(3.4)  Q(t)  =  £  5„/(A(n)  <  t  <  A(n  -»•  1)} 

n=0 

where  A(o)  =  0.  If  we  assume  that  A(n)  oo  a.s.  (i.e.  that  the  process  is  non-explosive), 
it  follows  that  (3.2)  through  (3.4)  yield  a  well-defined  process  Q.  The  process  Q  is  said  to 
be  the  canonical  GSMP  associated  with  “problem  data”  as  specified  by  (3.1).  (For  more 
detail,  see  WHITT  (1980)). 

A  key  point  of  the  above  analysis  is  that  a  large  class  of  queueing  network  simulations 
have  Markov  structure.  The  GSMP  framework  is  a  context  which  allows  us  to  precisely 
identify  the  nature  of  the  associated  Markov  structure.  This,  in  turn,  has  profound  impli¬ 
cations  both  for  the  development  of  improved  simulation  algorithms  and  for  the  analysis 
of  existing  methodologies. 

4.  OUTPUT  ANALYSIS  FOR  TRANSIENT  SIMULATIONS 

Let  X  =  (S«  :  n  >  0)  be  the  GSSMC  corresponding  to  a  GSMP.  Let  T  be  a  finite-valued 
(possibly  randomized)  stopping  time  for  X  and  let  Y  =  f(Xn  ■  0  <  n  <  T)  be  an  .Revalued  r.v. 


4  * 


The  transient  simulation  output  analysis  problem  concerns  the  estimation  of  quantities  of 
the  form 

(4.1)  a  =  g(EY) 

where  g :  IR*  -»  R. 

EXAMPLE  1:  Let  A  be  a  real-valued  function  defined  on  the  state  space  S  of  the 
GSMP  Q  and  consider  the  problem  of  estimating  the  parameter  a  =  Eh(Q(t)).  This  is  a 
special  case  of  (4.1),  in  which  g(y)  =  y,  Y  =  f(Sn,Cn  :  0  <  n  <  N(t)  +  1)  =  A(Sw(t)),  and 
T  —  1  =  JV(t)  =  max{n  >  0  :  A(n)  <  t}. 

EXAMPLE  2:  Given  a  subset  A  of  the  state  space  of  Q,  define  S(A)  =  inf{t  >  0  :  Q(t)eA}. 
If  we  set  Y  =  A(T(j4)),  where  T(A)  =  inf{n  >  0  :  5neA),  and  y(y)  =  y,  then  (4.1)  incorporates 
the  problem  of  calculating  a  =  ES(A). 

EXAMPLE  3:  For  two  subsets  A  and  B,  consider  the  problem  of  estimating  a  = 
£{S(i4)|5(A)  <  5(B)}.  The  estimation  of  such  a  conditional  expectation  is  a  special  case 
of  (4.1),  in  which  d  =  2,  Y  =  (Jl(T{A))I(T(A)  <  T(B)),  7(r(A)  <  T(B)))  and  yfyi.ya)  =  y1/y2. 
(To  define  g  on  all  of  Rd,  set  y(y,0)  =  0.) 

EXAMPLE  4:  Given  a  real-valued  function  A,  suppose  that  we  wish  to  estimate  the 
variance  a  =  varA(<J(t)).  Here,  Y  =  (A3(Sw(l)),  d  =  2,  and  yfyi.ya)  =  yi  -  vl- 

EXAMPLE  5:  Let  T  be  a  stopping  time  for  X,Y  =  (fi(Xn:0<n<T),  f2(Xn:0 <n<T))  (/i,/3 
are  real-valued),  and  <1=2.  The  ratio  estimation  problem  a  =  EY[l)/EY(2)  (Y  =  (y(l),F(2)) 
is  the  instance  of  (4.1)  in  which  y(yi,  ya)  =  Vi/ya-  (Define  g{y, 0)  =  0.) 

A  methodology  for  solving  (4.1)  is  straight-forward  and  readily  obtained.  Let  Xu  X2,  ■  ■  ■ 
be  i.i.d.  copies  of  the  process  X ,  and  set  Y{n)  =  n-1  Y(  with  F(0)  =  0  where  Y(  =  f(Xin  : 
0  <  n  <  T<).  An  estimator  a(n)  for  a  is  then  given  by 

a(n)  =  y(F(n)). 

The  following  theorem  summarizes  the  behavior  of  a(n)  (see  SERFLING  (1980),  p. 
118,  for  a  proof  which  can  be  easily  modified  to  cover  the  current  circumstances.) 
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THEOREM  1: 

i)  If  £j|K||  <  oo  and  if  g  is  continuous  in  a  neighborhood  of  EY,  then  a(n)  -*  a  a.s.  as 
n  — * co. 

ii)  If  E\\Y\\2  <  oo  and  if  g  is  differentiable  at  EY,  then  nl!2(a[n)  -  a)  =>  oN{ 0,  i)  as  n 
where  oa  =  Vg(EY)tCVg{EY),  C  =  EYY*  -  EY  FY*,  and  Vg(EY)  is  the  gradient  of  g 
evaluated  at  EY. 

Thus,  under  suitable  regularity  hypotheses,  a(n)  is  strongly  consistent  for  a  (i.e.  con¬ 
verges  to  a  a.s.)  and  satisfies  a  central  limit  theorem  (CLT).  The  CLT  asserts  that  the  rate 
of  convergence  of  <*(n)  to  a  is  roughly  of  order  n-1/3.  Thus,  in  order  to  add  an  additional 
significant  figure  of  accuracy  (i.e.  increase  accuracy  by  a  factor  of  10),  one  has  to  increase 
the  number  of  observations  by  a  factor  of  100.  This  suggests  that  simulation  is  a  relatively 
inefficient  tool  for  obtaining  high-accuracy  solutions.  Of  course,  in  many  applications,  two 
or  three  significant  figures  of  accuracy  is  quite  acceptable  and,  as  indicated  in  Section  2, 
simulation  can  then  be  a  competitive  technique. 

Because  of  the  slow  convergence  rate  of  Monte  Carlo  simulation,  error  analysis  is 
particularly  important.  Typically,  simulators  assess  error  by  producing  confidence  intervals 
for  the  parameter  to  be  estimated.  The  CLT  provided  by  Theorem  1  is  the  key  to  obtaining 
such  confidence  intervals  in  the  transient  setting.  Set 

rtf-TlnWn) 

ft  ‘ 

»=1 

•(»)  =  (V*(?(»))«  C(n)Vg(T (n))*/2, 

and  let  7(n)  =  [a(n)~ *(5)«(n)/n1/f3,  an+*(5)»(n)/n1/3]  where  *(5)  solves  P{iV(0, 1)  <  *(5)}  =  1—5/2. 
The  following  result  states  that  /(n)  is  an  asymptotic  100(1  -  6)%  confidence  interval  for  a. 
(The  proof  is  easy.) 

PROPOSITION  1:  If  £||y||3  <  oo,  g  is  continuously  differentiable  in  a  neighborhood  of 
EY,  and  a2  *  Vg{EYyCVg(EY)  >  0,  then  P{atl(n)}  1-5  as  n  — oo. 

Proposition  1  describes  the  basic  output  analysis  algorithm  for  transient  simulations. 
It  indicates  what  data  needs  to  be  collected  from  the  simulation,  and  how  the  data  must 
be  manipulated  in  order  to  produce  confidence  intervals  that  are  asympotically  valid. 

Several  important  refinements  of  Proposition  1  are  worth  mentioning,  however.  First,  it 
may  be  inconvenient  to  se.  the  run  length  in  terms  of  the  number  of  observations  generated, 
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since  the  amount  of  computer  time  required  to  generate  a  fixed  number  of  observations 
will  in  general  be  random  (e.g.  time  to  simulate  a  Jackson  network  to  time  t  corresponds 
to  a  random  number  of  state  transitions).  It  is  often  more  natural  to  specify  run  length 
in  terms  of  a  computer  budget  t ,  and  to  produce  confidence  intervals  based  on  the  data 
generated  within  that  budget  constraint. 

Let  pufat...  be  the  amounts  of  computer  time  required  to  generate  yi,Y3,...  respec¬ 
tively;  we  make  the  natural  assumption  that  the  random  vectors  (Yu  Pi),  (Fj,  Pj),  . . .  are  i.i.d. 
Note  that  the  number  of  observations  *(t)  =  max{n  >  0 :  £J*=1  ph  <  t}  generated  by  time  t  is  a 
renewal  process.  The  natural  point  estimate  for  a  is  at  =  a(*(t))  and  the  natural  confidence 
interval  to  use  is  It  =  7(*(t)). 

THEOREM  2: 

i)  If  E\\Y\\  <  oo,  Pi  <  oo  a.s.,  and  if  g  is  continuous  in  a  neighborhood  of  EY,  then  at  —  a 
a.s. 

ii)  if  £||y||3  <  oo,  EPi  <  oo,  and  if  g  is  continuously  differentiable  in  a  neighborhood  of 
EY,  then  t1/3(a«  -  a)  =>  aiy(O.l)  as  t  -*  oo  where  a3  =  EPi  •  Vg[EY)*CVg[EY),  and  C  = 
EYY*  -  EY  EY*.  Furthermore,  if  a3  >  0,  then  P{ae It)  —  1-6  as  t-+oo. 

We  refer  to  GLYNN  and  WHITT  (1988a)  for  a  proof  of  Theorem  2. 

A  second  refinement  concerns  running  the  simulation  according  to  a  sequential  stopping 
rule,  in  which  the  run  length  is  not  set  in  advance.  Instead,  the  simulation  is  allowed  to 
run  until  either  an  absolute  or  relative  error  precision  *  is  met.  Note  that  the  confidence 
interval  7(n)  has  half-width  *(5)«(n)/n1''3.  Thus, 

(4.2)  T(e)  =  min{n  >  1 :  *(5)j(r»)n-1^3  +  r»-1  <  *} 

is  a  sequential  stopping  rule  designed  to  stop  the  simulation  when  the  interval  half-width 
drops  below  *  (the  additional  n_1  term  is  included  to  prevent  the  simulation  from  stopping 
too  early);  this  is  an  absolute  precision  stopping  rule.  (In  a  relative  precision  rule,  «  would 
be  scaled  to  the  magnitude  of  a.)  Set  7(f)  =  7(T(«)). 

THEOREM  3:  If  £||y||3  <  oo,  g  is  continuously  differentiable  in  a  neighborhood  of  EY, 
and  <r3  =  V  g(EY)*CV  g(EY)  >  0,  then  P{aef(e)}  —  1-6  as  el  0. 

See  GLYNN  and  WHITT  (1988b)  for  a  proof. 
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A  major  concern  in  all  oi  the  confidence  interval  methodologies  described  above  is  that 
they  are  based  on  asymptotic  limit  theory.  For  small  to  moderate  run  lengths,  the  limit 
theory  may  be  misleading.  From  a  practical  standpoint,  this  manifests  itself  by  producing 
confidence  intervals  with  incorrect  coverage  levels  (e.g.  a  confidence  interval  with  a  stated 
coverage  level  of  90%  may  only  cover  a  80%  of  the  times).  Several  factors  typically  lead 
to  confidence  interval  degradation.  Firstly,  the  r.v.’s  Yi,Y2t...  may  be  highly  skewed  and 
asymmetric  so  that  the  multivariate  CLT  for  F(n)  exhibits  a  slow  rate  of  convergence. 
Secondly,  non-linearities  in  g  may  exacerbate  the  situation.  For  example,  F(n)  is  unbiased 
for  EY ,  but  s(F(n))  is  generally  biased  for  g(EY)  when  g  is  non-linear.  These  factors  lead 
to  slow  convergence  in  the  CLT  for  a(n)  which,  in  turn,  leads  to  poor  coverage  in  J(n). 
Development  of  small-sample  corrections  to  the  confidence  intervals  described  above  is  an 
active  area  of  current  research. 

5.  OUTPUT  ANALYSIS  FOR  STEADY-STATE  SIMULATIONS 

In  this  section,  we  describe  the  output  analysis  problem  for  estimation  of  steady-state 
parameters.  For  / :  E  -♦  JRd,  let  Yn  =  f(Xn)  where  X  ~  :  n  >  0)  is  the  GSSMC  associated 

with  the  queueing  system.  Suppose  the  sequence  {Yn  :  n  >  0}  is  ergodic  in  the  sense  that 
there  exists  a  finite- valued  (deterministic)  vector  m  such  that 

(5.1)  ?(n)  =>•  n 

as  n  -*  oo.  The  steadv-state  simulations  output  analysis  problem  concerns  estimating 
a  =  g(n)  where  g :  Rd  -*  R  is  given. 

In  many  applications,  the  simulator’s  interest  centers  on  steady-state  characteristics 
of  the  process  Q,  as  opposed  to  X.  Specifically,  for  h  :  S  ->  JRd ,  interest  frequently  is 
concentrated  on  the  long-run  behavior  of 

i  J*  h(Q(s))ds 

and,  more  generally, 

(5.2)  a  =  *(  lim  ^  /  /»(Q(*))<f«) 

(if  the  limit  exists)  for  k:  Rd  -*  JR. 

EXAMPLE  6:  If  Q  is  the  queue-length  process  for  an  open  queueing  network  and  fc(x)  = 
*!  +  ••■  +  *a,  then  a  (as  defined  by  (5.2))  corresponds  to  the  steady-state  expected  total 
number  of  customers  in  the  system. 


EXAMPLE  7:  To  calculate  the  steady-state  variance  of  the  total  number  of  customers 
in  the  system,  let  h(q)  =  ((£?9<)3.  £?*)  and  set  k(yuVi)  =  yx  -  yf . 


EXAMPLE  8:  To  measure  the  steady-state  ratio  of  customers  at  two  stations  *  and  j  in 
a  queueing  network,  let  Q  be  the  corresponding  vector  queue-length  process,  h{q)  =  (g,,?,) 
and  set  k(y i.ya)  =  yi/ya- 

The  next  result  shows  that  the  estimation  problem  (5.2)  is  but  a  special  case  of  the 
steady-state  simulation  problem  for  the  chain  X.  (For  the  proof,  see  the  Appendix.) 

PROPOSITION  2:  For  h  :S  ->  Rd,  suppose  that 


ni?o 


where  axe  finite  (deterministic)  constants  and  m  >  o.  Then, 


~  M<?(»))*  “  Mi/Ma 


as  t  —  oo  and  (5.2)  is  a  special  case  of  the  steady-state  simulation  problem,  in  which  f(a,c)  = 
(A(«)t*(«,c),t*(»,c))  and  g :  St*+l  -*  Mi  is  defined  by  g[x -  k[x i/x4+l . *d/*d+i). 

Before  proceeding  to  the  discussion  of  simulation  methodology  for  the  steady-state 
simulation  problem,  we  will  detour  (for  a  moment)  to  discussing  conditions  guaranteeing 
the  validity  of  (5.1).  Such  conditions  show  that  the  steady-state  simulation  problem  is 
well-posed.  A  key  to  the  analysis  is  the  Markov  chain  X.  An  extensive  body  of  theory  has 
been  developed  to  study  the  recurrence  structure  of  GSSMC’s.  One  particularly  powerful 
approach  involves  proving  that  the  chain  X  is  recurrent  in  the  sense  of  Harris.  Specifically, 
the  chain  X  is  said  to  be  Harris  recurrent  if  there  exists  a  set  A  c  E,  a  positive  number  A, 
an  integer  n  >  1,  and  a  probability  distribution  <p  on  E  such  that: 

(5.3)  i)  P{Xn*A  infinitely  often  |X0  =  («,  c)}  =  1  for  all  («,c)eE 
ii)  P{X„€  \X0  =  (», c)}  >  A <p{  )  for  all  (*,  c)eA. 
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In  most  applications,  A  is  typically  a  compact  subset  of  E  and  condition  ii)  translates 
into  a  requirement  that  the  measures  P{Xnt  ■  |Af0  =  (*,  c)}  have  a  common  density  compo¬ 
nent  which  is  uniformly  bounded  (in  («,  e)eA)  away  from  zero;  if  the  density  component  is 
continuous  in  («,c),  the  compactness  of  A  often  suffices  to  obtain  a  uniform  lower  bound  on 
the  densities.  This  discussion  suggests  that  we  can  reasonably  expect  many  GSMP’s  to  be 
Harris  recurrent,  particularly  when  considering  those  GSMP’s  for  which  the  distributions 
have  continuous  positive  (Lebesgue)  density  components. 

An  important  observation  in  the  study  of  Harris  chains  is  that  condition  (5.3)  permits 
one  to  “split”  the  transition  function  P{Xne  ■  =  (»,e)}  over  the  set  A: 

(5.4)  P{Xns  ■  \X0  =  (a,c)}  =  A*(.)  +  (1  -  A)Q((*,  c),  •) 

for  (a,  c)eA ,  where  Q((a,  c),  -)  is  a  probability  distribution  on  E.  Condition  (5.4)  states  that 
if  Xf>eA,  the  distribution  of  XT>+n  is  determined  by  a  “coin  flip”  in  which  the  probability 
of  success  is  A.  If  the  coin  flip  is  successful,  XT>+n  is  distributed  according  to  <p(  )  (inde¬ 
pendently  of  the  position  of  X  at  time  r*),  whereas,  if  the  coin  flip  is  unsuccessful,  XT'+n  is 
distributed  according  to  Q(Xr>,  )' 

Note  that  condition  (5.3)  i  guarantees  that  A  is  hit  infinitely  often  by  X.  Since  the 
probability  of  a  successful  coin  flip  is  A  at  each  visit  to  A,  a  “geometric  trials”  argument 
shows  that  eventually  a  successful  coin  flip  must  occur;  let  r  be  the  associated  time  at 
which  X  is  distributed  according  to  <p.  It  is  evident  that  Xr  has  a  distribution  indepen¬ 
dent  of  X0 . XT-„.  This  regenerative-type  characteristic  of  the  random  time  r  can  be 

shown  to  imply  that  Harris  chains  possess  a  non-trivial  a-finite  measure  r  (unique  up  to  a 
multiplicative  constant)  which  is  stationary  in  the  sense  that 

*(•)=  f  P{Xi'  \X0  =  z)*{dz). 

J  E 

Furthermore,  if  *(-)  is  finite,  it  can  be  expressed  as 

■w-*{gi(«wW 

U=0  ) 

where  Ev{  )  denotes  the  expectation  conditional  on  the  initial  distribution  of  X  equalling  <p. 
(See  ATHREYA  and  NEY  (1978)  for  details.)  Finally,  if  E? r  <  oo  and  if  £„  ||/(^k)||)  < 

oo,  the  strong  law  (5.1)  holds: 


as  n  -*  oo  (regardless  of  the  initial  distribution  of  X);  see  REVUZ  (1984),  p.  139.  Thus,  the 
Markov  chain  theory  described  here  suggests  that  the  limit  theorem  (5.1)  will  frequently 
be  in  force  when  dealing  with  steady-state  queueing  simulations.  Developing  precise  con¬ 
ditions  on  the  basic  building  blocks  of  the  GSMP  (i.e.  S,E(»),r„,p(t';»,e),  F(-t  «',«,«)) 
that  guarantee  Harris  recurrence  of  X  is  an  active  area  of  current  research. 

An  additional  benefit  of  this  approach  is  that  if  E„r  <  oo  and  if  £*  ll/(**)ll  <  °o, 

the  regenerative-type  structure  of  X  permits  us  to  re-express  the  steady-state  limit  n  in 
terms  of  the  ratio 

r— 1 

(5.5)  M  ^  E  JWIE*r- 

fc=0 

(see  GLYNN  (1982)).  As  we  shall  see,  (5.5)  will  permit  us  to  reduce  the  steady-state 
simulation  problem  to  a  special  case  of  the  transient  simulation  problem  studied  in  Section 
4.  This  idea  lies  at  the  core  of  the  regenerative  method  of  steady-state  simulation  output 
analysis. 

Specifically,  note  that  if  E^  ||/i(5h)|lf{5fc,  Ck)  <  oo  and  **($.,£*)  <  oo,  then 

Proposition  2,  (5.1)  and  (5.5)  may  be  combined  to  prove  that 


tJo  Evy,izU*{Sk,ch) 


ty0'—  BrTZlrfaGu) 

aat-*oo.  Then,  estimating  a  as  defined  by  (5.2)  is  equivalent  to  estimating  the  parameter 
o  =  g[ErY)  where 


(EMS*)‘* ($*,<?*),  E‘*(^,cfc)) 

\*=o  fc-0  / 


and  g{yi . Vd,Vd+ 1)  =  Hvi/Vd+i . va/w+i)-  Observing  that  Y  depends  on  X  only  up  to  the 

randomized  stopping  time  r,  we  see  that  a  is  expressed  precisely  in  the  form  (4.1),  which  is 
the  transient  simulation  problem.  Thus,  all  the  transient  simulation  methodology  described 
in  Section  4  can  be  readily  applied  to  the  estimation  of  the  steady-state  parameter  a. 

The  regenerative  method  is  particularly  straightforward  to  carry  out  in  the  case  that 
the  process  Q  =  (Q(t) :  t  >  0)  is  an  5-valued  continuous-time  Markov  chain  (CTMC).  This 
typically  occurs  when  all  the  distributions  F{x)t’,e',s,e)  are  exponential  (i.e.  of  the  form 
1  -  exp(-A(«', «',«,«)*)  for  *  >  0).  If  Q  is  an  irreducible  positive  recurrent  CTMC,  it  is  well 
known  that  the  process  Q  “regenerates”  at  those  instants  at  which  Q  enters  a  fixed  state, 
say  y*5. 
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Thus  the  regenerative  structure  is  available  to  the  simulator  without  having  to  explic¬ 
itly  “split”  the  transition  function  of  X  as  in  (5.4).  In  particular,  if  r(y)  =  inf{n  >  1 :  Sn  =  y} 
and  £»£fcJo"l(IIM5*)l!  +  i)t*(S*,C*)  <  oo  (Ew()  is  the  expectation  operator  for  the  CTMC 
conditional  on  Q[0)  =  y),  then  a  (as  defined  by  (5.2))  takes  the  form  a  =  g(EvY(y))  where 

y(y . . .  =  k(yihd+i . Vd/vd+i )  and  K(y)  is  given  as  in  (5.6),  with  r(y)  playing  the 

role  of  r.  In  applying  the  transient  methodology  of  Section  4  to  the  regenerative  steady- 
state  analysis  of  CTMC’s,  we  recall  that  the  algorithm  involves  simulating  i.i.d.  copies  of 
the  chain  Q  from  time  0  to  time  A(r(y)).  An  interesting  feature  of  the  algorithm  is  that 
because  of  the  fact  that  the  evolution  of  Q  is  independent  over  “cycles”  formed  by  entrance 
times  to  y,  the  above  approach  is  equivalent  to  simulating  one  copy  of  Q  until  time  A(r„(y)), 
where  r„(y)  in  the  time  of  the  n’th  visit  of  (S„  :  n  >  0)  to  y.  Thus,  forming  the  transient 
estimate  a[n)  =  ?(?(»))  is  equivalent  to  simulating  Q  for  A(r„(y))  time  units  and  computing 
k  ((hoQ)  (A  (r„(y)))),  where  (hoQ)  (t)  =  t-1  /*  h(Q(»))d». 

Given  that  any  state  yeS  may  be  chosen  as  the  “regeneration”  state,  it  is  natural  to  ask 
whether  the  efficiency  of  the  regenerative  method  for  estimating  a  —  k  (lim^oo  (h  o  Q)  (t)) 
is  affected  by  the  choice  of  the  regeneration  state.  By  efficiency,  we  refer  to  the  quality 
of  the  estimator  available  after  t  units  of  computer  times  have  been  expended.  Theorem 
2  (together  with  the  regenerative  method,  which  permits  us  to  estimate  a  via  yfPnfy))) 
shows  that  the  regeneration  state  *  is  more  efficient  than  y  if  <r3(*)  <  «r2(y),  where  <r3(-)  = 
E0i(  )Vg(EY(  )yc(  )Vg(EY(  ))  (clearly,  Vy(£T),  and  C  all  depend  on  the  regeneration 
state) .  It  seems  reasonable  to  assume  that  the  computational  effort  p3-  required  to  generate 
Q  over  the  j’th  cycle  takes  the  form 

&  =  £*(*») 

k 

for  some  function  x(-)>  where  the  sum  is  over  the  states  visited  on  the  y’th  cycle. 

PROPOSITION  3:  Suppose  Q  is  a  positive  recurrent  irreducible  CTMC,  and  that  fc(-) 
is  differentiable.  If  x(y)  >  0  for  all  y  and  if  Efii(s)  <  oo,  £'||y(*)||J  <  oo  for  some  *,  then 
<ra(y)  =  <7a(*)  for  all  yeS, 

For  the  proof,  see  the  Appendix.  Thus,  the  efficiency  of  the  regenerative  method  for 
CTMC’s  is  independent  of  the  choice  of  the  regeneration  state.  (This  result  does  not  extend 
in  general  to  the  regenerative  method  as  based  on  the  “splitting”  approach  of  (5.4).) 


A  limitation  of  the  regenerative  technique  described  above  for  queueing  systems  (based 
on  the  “splitting”  method)  is  that  it  takes  significant  effort  (both  analytically  and  in 
programming  time)  on  the  part  of  the  simulator  to  adapt  software  so  as  to  identify  the 
regeneration  times.  An  alternative  approach  to  developing  methodologies  for  the  steady- 
state  simulation  problem  requires  strengthening  (5.1)  to  a  functional  central  limit  theorem 
(FCLT)  hypothesis  on  (Yn  :  n  >  0),  where  Yn  =  f(Xn):  There  exists  a  matrix  A  such  that 


(5.7) 


=►  AB[t) 


as  n  -*  oo  in  D[0,  oo)  (the  Skorohod  space  of  right  continuous  functions  with  left  limits  having 
domain  (0,oo)  and  range  JZ**),  where  £(-)  is  a  standard  Brownian  motion  on  JR*  (so  that 
EB(a)B[»)*  =  a  •/).  MAIGRET  (1978)  proved  FCLT’s  of  the  form  (5.7)  for  Harris  chains,  by 
exploiting  the  martingale  central  limit  theorem.  (Although  her  proof  was  stated  for  d  =  l, 
the  result  is  easily  extended  to  our  current  setting.)  Thus,  (5.7)  is  a  hypothesis  which  one 
can  reasonably  expect  to  hold  for  a  great  many  queueing  systems. 

Let  a(n)  =  y(P(n))  be  the  point  estimator  for  the  steady-state  limit  a.  The  following 
result  is  easily  established,  and  summarizes  the  behavior  of  a(n). 


PROPOSITION  4:  Suppose  (5.7)  holds  and  let  C  =  A* A.  If  g  is  continuous  at  ft,  then 
a(n)  =»•  a  =  g[y)  as»»-»oo.  If,  in  addition,  g  is  differentiable  at  ft,  then  nl/3(a(n)-a)  =>  <rJ\T(o,  l) 
as  n-oo,  where  a 3  =  Vg(ftycVg{ft). 

Proposition  4  shows  that  if  one  cam  construct  an  estimator  »(n)  =*•  |<r|  as  n  -♦  oo,  then 
P{ae[a(n)  -  +  *($)*(n)/n1/3]}  -»l-fasn-»oo  whenever  a1  >  0.  Thus,  a 

confidence  interval  methodology  is  easily  obtained,  once  one  is  given  a  consistent  estimator 
for  a2.  As  a  consequence,  the  development  of  consistent  estimators  for  the  parameter 
a3  appearing  in  Proposition  4  is  the  central  problem  of  steauiy-state  simulation  output 
analysis. 

To  construct  a  consistent  estimator  for  or3,  observe  that  if  g  is  continuously  differentiable 
at  ft,  then  (5.7)  implies  that  Vy(P(n))  =►  Vg{ft)  as  n  -»  oo.  Thus,  the  hard  part  of  the 
estimation  problem  deals  with  estimating  C.  To  estimate  C,  observe  that  if  (n||F(r»)  -  m||3  : 
n  >  1}  is  uniformly  integrable,  then 


(5.8) 


C  =  lim  n  ■  £(P(n)  —  ^)(P(n)  —  ft)*. 

F8— *00 

17 


Suppose  (Yn  :  n  >  0)  is  strictly  stationary  with  l|£(*o  -  t*)(Yn  -  m)*II  <  By  computing 
the  right-hand  side  of  (5.8),  one  easily  shows  that 

(5.9)  C  =  E(Yo  -  M)(y0  -  M)‘  +  f)  E(Yo  -  n){Yh  -  M)‘  +  E(Y„  -  ^)(y0  -  m)‘- 

k=l 

Several  different  methods  (e.g.  spectral  techniques,  autoregressive  methods)  for  estimating 
C  rely  on  the  representation  (5.9);  see  Chapter  3  of  BRATLEY,  FOX,  and  SCHRAGE 
(1987).  Recall  that  (5.9)  only  holds  exactly  when  the  sequence  (Y„  :  »  >  0)  is  strictly 
stationary.  Typically,  a  simulation  of  the  GSSMC  X  gives  rise  to  a  non-st  at  ionary  process, 
since  X0  will  usually  have  a  non-stationary  distribution.  In  order  to  justify  the  use  of 
stationary  process  ideas  in  a  queueing  simulation  context,  it  is  of  some  comfort  to  recognize 
that  if  A  is  an  aperiodic  Harris  chain  with  a  finite  invariant  measure,  then 

(Xn+h  ’■  k>0)  =>  (XJ  :k>0) 

(weak  convergence  in  the  product  topology)  as  n  -»  oo,  where  X*  =  (XJ  :  k  >  0)  is  strictly 
stationary  (see  REVUZ  (1984),  p.  198).  Thus,  one  expects  that  many  queueing  systems  are 
at  least  asymptotically  stationary.  This  suggests  that,  at  least  in  a  large-sample  context, 
algorithms  based  on  a  stationarity  assumption  should  behave  reasonable  well. 

As  in  the  transient  simulation  setting,  it  is  often  more  convenient  to  define  sample 
size  in  terms  of  a  budget  constraint  t  on  the  computer  time.  As  in  the  transient  case, 
let  £i,/9a, . . .  be  the  amounts  of  computer  time  required  to  generate  Yi,K3,...  respectively. 
Then,  «(*)  =  max{n  >  0 :  Pk<*}  is  the  number  of  observations  generated  in  t  time  units. 

The  natural  point  estimate  for  a  is  at  =  a(*(t)). 

THEOREM  4:  Suppose  that  (5.7)  holds  and  that  g  is  differentiable  at  p.  If  »-1  fik  -* 
7  a.s.  as  n  -m»  (where  7  is  a  finite  positive  deterministic  constant) ,  then  tl/3(at  -<*)=► 
<rN[ 0, 1)  as  t  -*  00,  where  a3  =  7 Vp(^)*CVy(p).  If,  in  addition,  a3  >  0  and  «<  =>  a  as  t  -*  00, 
then  P{et«[at  -  a,  +  n{6)$tltll3\}  as  t  -*  00. 

For  a  proof,  see  GLYNN  and  WHITT  (1988a) 

An  analogue  of  the  sequential  stopping  rule  methodology  for  transient  simulations  also 
extends  to  the  steady-state  setting.  Let  »(n)  be  an  estimator  for  (Vy(/i)*CV3(^j))1''3  based  on 
the  observations  Yi . Yn  and  let  T(e)  be  defined  as  in  (4.2). 

THEOREM  5:  Suppose  that  (5.7)  holds  and  that  g  is  differentiable  at  n.  If  «(n)  -* 
(V gfay CV g(n))lt3  >  0  a.s.  as  n  — *  00,  then  P{ae[a(T(e))  —  *,  a(T[t))  +  e]}  — ♦  1  —  6  as  e  |  0. 
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For  a  proof,  see  GLYNN  and  WHITT  (1988b). 


6.  RUN-LENGTH  DETERMINATION  FOR  QUEUEING  SIMULATIONS 

As  indicated  in  Sections  4  and  5,  a  major  concern  of  simulators  centers  on  the  issue 
of  the  amount  of  computer  time  t  required  to  calculate  a  queueing  parameter  a  to  within 
a  given  precision  «.  One  approach  to  the  problem  involves  using  a  sequential  stopping 
rule  of  the  form  (4.2).  Several  difficulties  with  the  method  are  apparent.  Firstly,  prior 
to  initiating  such  a  simulation,  the  simulator  will  typically  have  no  idea  as  to  how  much 
computer  time  the  simulation  will  use  in  computing  the  T(t)  observations  determined  by 
the  stopping  rule.  Thus,  the  simulation  may  significantly  exceed  the  desired  computer 
time  budget  (or,  alternatively,  the  simulator  may  need  to  terminate  the  simulation  early) 
when  such  a  sequential  algorithm  is  used.  Secondly,  the  nature  of  the  random  sample 
size  T(c)  can  introduce  certain  contaminating  effects  into  the  simulation  output  analysis 
procedure  that  generally  would  not  be  present  (or,  at  the  least,  would  be  present  in  smaller 
quantities)  in  a  deterministic  run-length  simulation.  For  example,  such  sequential  methods 
are  prone,  particularly  for  largish  «,  to  early  termination  because  of  an  overly  optimistic 
variance  estimate  caused  by  the  observations  Yu  Y9, . . . , Kr(«)  clustering  closely  together  (due 
to  random  circumstance).  However,  it  turns  out  (fortunately)  that  such  contaminating 
effects  decrease  to  zero  as  «  i  0.  As  a  consequence,  the  sequential  stopping  rules  of  Sections 
4  and  5  continue  to  form  an  important  class  of  simulation  algorithms,  particularly  for  « 
small. 

Nevertheless,  the  above  discussion  indicates  that  approximation  of  the  simulation  run- 
length  required  to  compute  a  to  within  t  can  be  a  valuable  tool  to  the  simulator.  The 
principal  application  is  to  determine  whether  a  given  (planned)  simulation  is  feasible.  For 
example,  if  the  approximate  run-length  is  prohibitively  large  for  the  precision  «,  the  simu¬ 
lation  would  either  need  to  be  abandoned,  be  made  more  efficient,  or  the  permissible  error 
level  «  would  need  to  be  increased.  Even  if  feasibility  is  guaranteed  by  the  approxima¬ 
tion,  the  approximation  has  further  applications.  The  approximate  run-length  could  be 
used  to  assign  a  (deterministic)  computer  budget  t  to  the  simulation,  thereby  avoiding  the 
contamination  effects  described  earlier. 

As  a  first  step  to  developing  an  approximation,  note  that  if  a,  is  an  estimator  for  a 
satisfying  t1/a(or,-o)  =►  <7^(0,  i),  the  100(1-5)%  confidence  interval  has  length  approximately 


2|<t| z{6)/tll\  Assuming  a  0,  this  suggests  that  the  run-length  required  to  obtain  a 
100(1  —  confidence  interval  for  a  of  length  e|a|  is  approximately  <cr3x3(6)/e?a2.  Assuming 
that  {t(at  -  a)3  :  t  >  0}  is  uniformly  integrable,  a1  can  be  evaluated  as 

a3=  Umt-  MSE( at). 

OO 

where  MSE(at)  =  E(at  -  a)3.  Thus,  the  key  to  developing  appropriate  run-length  approxi¬ 
mations  is  an  estimate  of  MSE(at)  for  large  t. 

ASMUSSEN  (1987)  and  WHITT  (1987)  have  studied  the  run-length  approximation 
problem  for  queues  in  heavy  traffic.  Specifically,  consider  the  problem  of  estimating  the 
steady-state  waiting  time  EW  of  a  customer  in  the  GI/G/ l/oo  queue  for  a  traffic  intensity 
just  below  unit.  Assuming  that  EW  is  estimated  by  »~1£!£ow'*  {wo,Wu  are  the  waiting 
times  of  successive  customers)  and  that  the  computer  time  required  to  generate  W0, lV»-i 
is  n,  it  is  argued  there  that 


MSE{ch)  =  tE  (^pWk~  Ew\ 


«  c  (1  -  p)~2 

for  some  constant  c.  Thus,  in  order  to  obtain  estimates  for  EW  of  high  relative  precision, 
it  is  necessary  for  t  >  (1  -  p)“3.  As  pointed  out  by  Asmussen  and  Whitt,  a  similar  anal¬ 
ysis  can  be  formally  carried  out  for  more  general  networks  of  queues  in  which  diffusion 
approximations  hold. 

An  important  feature  of  this  theory  is  that  it  forcefully  points  out  the  substantial 
amounts  of  computer  time  required  to  obtain  (relatively)  accurate  solutions  to  queues  in 
heavy  traffic. 

7.  VARIANCE  REDUCTION  TECHNIQUES  (VRT’S) 

The  idea  underlying  variance  reduction  is  to  exploit  the  stochastic  structure  of  the 
queueing  model  under  consideration,  in  order  to  obtain  improved  computational  efficiency. 
The  stochastic  structure  can  often  be  used  to  construct  a  variety  of  alternative  estimators 
for  the  quantity  a  to  be  estimated.  The  remainder  of  this  section  will  be  devoted  to  a 
discussion  of  how  to  choose  the  most  “efficient"  estimator  in  the  class  available  to  the 
simulator. 

Consider  two  competing  estimators  <*i(n)  and  a3(n).  Suppose  that  <*<(n)  is  obtained  by 
generating  a  sequence  (Y5,  :  1  <  j  <  n),  and  that  the  time  required  to  generate  Yi,  is  given 


.VVVVKVV  A.va 
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by  fa,.  Thus,  the  time  required  to  generate  a,(n)  is  given  by  ptl  +  ...  +  0in.  Conversely, 
given  a  computer  budget  t,  the  estimator  available  at  time  t  is  cut  =  a^A^t)),  where 
Ni[t)  =  max{n  >  0  :  £*=1  ft*  <  t }. 

Assume  that  the  estimators  a<(n)  (»  =  1,2)  satisfy  FCLT’s:  There  exists  finite  <r<  and 
*  <  1  such  that 

(7.1)  n1/a(ta<((nfj)  -  ta)  =>  <7,i?(f) 

as  »  -♦  00,  in  D[t,  00),  where  B{  )  is  a  standard  real-valued  Brownian  motion.  If  we  see  t  =  1 
in  (7.1)  and  assume  that  (n(a<(n)  ~  a)a  :  n  >  1}  is  uniformly  integrable,  we  find  that 

var  an  (n)  =  o}/n  +  o(  1/n). 

We  shall  therefore  say  that  a3(n)  has  lower  variance  than  the  estimator  ai(n)  if  0}  <o\. 
Techniques  which  give  rise  to  a  new  estimator  a3(n)  having  lower  variance  than  the  original 
estimator  ax(n)  are  called  variance  reduction  techniques  (VRT’s). 

To  understand  (7.1)  better,  suppose  that  a<(n)  =  *(£(«)),  where  £(n)  =  n-1  Yik. 
This  comprises  a  class  of  estimators  extensively  studied  in  both  Sections  4  and  5.  If  the 
FCLT  (5.7)  holds  for  $<(»),  and  *  is  continuously  differentiable  in  a  neighborhood  of  m 
is  the  centering  constant  appearing  in  (5.7)),  then  (7.1)  is  valid  with  af  =  V gi(mY A*AiV 
(A  is  the  scaling  matrix  appearing  in  (5.7)). 

It  might,  at  first,  seem  reasonable  to  choose  the  estimator  with  the  smallest  possible 
variance.  However,  this  choice  neglects  the  fact  that  the  simulation  of  the  sequence 
(Yij  :  j  >  0)  may  be  considerably  cheaper  for  one  estimator  than  the  other.  Thus,  the 
computer  time  needed  to  generate  observations  should  be  factored  into  the  decision  as  to 
which  estimator  to  use.  This  can  be  considered  mathematically  by  analyzing  the  estimators 
ait  (*  =  1,2)  rather  than  the  a,(n)’s.  The  following  result  generalizes  Theorems  2  and  4  (see 
GLYNN  and  WHITT  (1988a)). 

THEOREM  0:  Suppose  (7.1)  holds  and  that  -*  7 <  a.s.  as  n  -*  00  with  7< 

finite,  deterministic,  and  positive.  Then,  tl/a(a<,  -  a)  =»  »?,-lV( 0, 1)  as  t  -*  00  where  7?  =  no}. 

Thus,  the  appropriate  figure  of  merit  in  comparing  the  computational  efficiency  of  two 
competing  estimators  ai(n),  a 3(n)  is  the  quantity  =  l/no}-  The  parameter  e,  is  called 
the  efficiency  of  the  estimator  a,.  Thus,  the  estimator  a3(n)  is  said  to  be  more  efficient 


than  ai(n)  if  e3  >  «i.  Techniques  which  give  rise  to  a  new  estimator  aa(n)  having  greater 
efficiency  than  the  original  estimator  aj(n)  are  called  efficiency  improvement  techniques 
(EIT’s). 

In  much  of  the  simulation  literature,  the  parameters  71,73  are  ignored  in  the  analysis 
of  computational  efficiency.  Thus,  the  class  of  EIT’s  is  then  identical  to  the  class  of 
VRT’s.  There  are  several  reasons  for  ignoring  the  effect  of  71,73.  First,  in  many  VRT’s, 
the  additional  computation  required  to  form  aa(n)  from  <*i(n)  is  believed  to  be  small,  so 
that  71  «  73.  Secondly,  the  exact  size  of  the  7<’s  is  hard  to  quantify  from  a  mathematical 
standpoint,  since  the  7< ’s  typically  reflect  the  quality  of  the  programming  implementation, 
as  well  as  various  machine  dependent  considerations. 


8.  VRT  1:  CONTROL  VARIATES 

Suppose  a(n)  is  a  real- valued  estimator  for  a  and  that  (£(n)  :  n  >  1)  is  an  2Rd-valued 
sequence  with  mean  zero.  A  sequence  (£(»)  :  n  >  1)  which  is  known  to  converge  to  zero 
asymptotically  is  called  a  control  variate  sequence.  The  idea  behind  control  variates  is  that 
the  correlation  structure  between  a(n)  and  6(n)  earn  be  fruitfully  used  to  obtain  a  variance 
reduction. 

Suppose  that  a(n)  and  5(n)  satisfy  a  joint  d  +  1  dimensional  multivariate  CLT:  There 
exists  a  finite-valued  matrix  A  such  that 

(8.1)  nx/a(a(n)  -  a,  *(n))  =»  i4iV(0, 1) 

as  n  ->  ».  Note  that  (8.1)  implies  that  6{n)  =>  0  as  n  -•  00.  To  exploit  the  correlation 
between  a(n)  and  6(n),  set  a(n,A)  =  a(n)  -  XtS(n)  for  XeJRd.  Observe  that  the  continuous 
mapping  principle,  as  applied  to  (8.1),  yields 

(8.2)  n1/a(a(n,  A)  -  a)  =►  <r(A)JV(0, 1) 

as  n-*».  The  CLT  (8.2)  shows  that  the  new  estimator  a(n,  A)  is  also  consistent  for  o,  in 
the  sense  that  a(n,  A)  =>  a  as  n  -» 00.  Since  A  is  at  our  disposal,  we  may  choose  it  so  as  to 
minimize  v2(A).  To  analyze  <ra(A),  write  C  =  A*  A  in  the  form 


'■(^+5) 


where  D  is  dx  d  and  b  is  d  x  1.  If  the  covariance  matrix  D  is  positive  definite,  it  is  easily 
shown  that  the  quadratic  form  <ra(A)  is  minimized  at 


and  that  <r2(A*)  =  <r3-btD~1b.  Of  course,  in  a  typical  simulation  context,  the  value  of  A*  will 
need  to  be  estimated  from  data  (although  in  certain  applications,  D  may  be  computable 
prior  to  initiating  the  simulation).  Assuming  (An  :  n  >  0)  is  consistent  for  A*  in  the  sense 
that  A*  =>  A*  as  n  -*  oo,  a  converging-together  argument  proves  that 

»1/a(a(n,A„)  -  a)  =*  <r(A*)N(0, 1) 


as  n  -*  oo.  Thus,  the  “controlled"  estimator  a(n,A„)  achieves  a  variance  reduction  as 
compared  with  the  original  estimator  <*(n).  To  determine  whether  a(n,A„)  achieves  an 
efficiency  improvement,  we  of  course  need  to  balance  the  additional  cost  of  computing 
S(n)  and  An  from  the  simulation  against  the  variance  reduction  obtained.  This  trade-off 
is  determined  both  by  the  estimation  problem  at  hauid  and  the  choice  of  control  variables 
used. 

A  particularly  convenient  set  of  control  variates  is  typically  available  for  the  steady- 
state  simulation  of  queueing  networks.  For  example,  consider  a  queueing  network  with  d 
stations  and  t  customer  types.  Suppose  that  Vif  =  (Viy(Jt) :  k  >  1)  is  the  sequence  of  customer 
service  times  for  customers  of  type  j  at  station  ».  Assume  that  the  sequence  V;,  is  made 
up  of  i.i.d.  r.v.’s  with  common  mean  m,  and  finite  variance  .  Let  Ntj(n)  be  the  number 
of  service  completions  for  type  j  customers  at  station  »  in  the  first  n  transitions  of  the 
GSSMC  X  =  (X„  =  (S„,C„)  :  n  >  0).  Choose  a  set  £  £  {(»,;')  :  1  <  *  <  d,  1  <  j  <  t}\  this  will 
correspond  to  the  set  of  control  variates  used.  Specifically,  let 


(8.4) 


6(n) 


E  vain-** 

km  l 


:  (t,;>£ 


The  following  result  is  straightforward  to  prove,  and  shows  that  4(n)  is  a  control  variate. 


THEOREM  7:  Suppose  that  (Vi,- :  {»,/)«£)  is  a  collection  of  independent  sequences.  Then, 
if  Ni}  (n) /n  ^  as  n  -*  oo  where  vy  is  finite,  positive,  and  deterministic  (for  all  (i,j)tB),  it 
follows  that 

nl/3*(n)  =>  Ac^(0,/) 

where  D  =  A*eAc  satisfies  D  =  diag(<r?  :  («,;')*£). 

The  matrix  D  appearing  in  Theorem  7  is  precisely  the  D  of  (8.3).  Hence,  the  control 
variates  defined  by  (8.4)  have  known  variance  structure;  see  BAUER,  VENKATRAMAN, 
and  WILSON  (1987)  for  further  details  on  how  to  exploit  this  fact. 
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LAVENBERG,  MOELLER,  and  WELCH  (1982)  examined  control  variate  sequences 
based  on  (8.4)  as  well  as  several  extensions  which  included  information  on  the  routing 
structure  of  the  network.  A  family  of  controls,  called  work  variables,  was  found  to  be 
particularly  effective.  The  ratio  of  variance  reduction  <r2(A*)/«r2  ranged  from  0.2  to  0.95  for 
the  models  simulated. 

9.  VRT  2:  INDIRECT  ESTIMATION 

As  the  discussion  of  the  previous  six  sections  has  suggested,  discrete-event  simulations 
for  queueing  systems  are  most  naturally  formulated  in  terms  of  the  queue-length  process 
Q  =  (Q{t) :  t  >  0).  In  many  applications,  the  simulator  may  be  more  interested  in  estimating 
characteristics  of  certain  waiting  times  in  the  system.  An  important  identity  which  can  be 
used  here  is  Little’s  Law. 

Let  Qt  =  (<?r(0  :  t  >  0)  be  the  total  number  of  customers  in  some  subnetwork  of  the 
system.  Let  (( An,Dn ) :  n  >  l)  be  the  sequence  in  which  An,Dn  correspond  to  the  arrival  and 
departure  times  of  the  n’th  customer  to  the  subnetwork.  Then,  Wn  =  Dn-An  is  the  amount 
of  time  a  customer  “waits”  in  the  subnetwork.  The  following  result  is  Little’s  Law. 

THEOREM  8:  If  A„/n  -*  A-1  a.s.  and  n~1jyHmlWk  -*  w  an.  (0  <  A, to  <  oo),  then 


^j*QT(s)d, 


q  a.t. 


as  t  — ►  oo  and  q  —  Xw. 


Thus,  the  6teady-state  mean  waiting  time  w  can  be  estimated  as  w  -  X~lq  =  y(A_1,j) 
where  g(x,  y)  =  xy.  So,  provided  that  we  record  both  the  queue-length  process  Qt  and  the 
arrival  times  (A„  :  n  >  1)  we  can  estimate  w  by  techniques  of  the  type  described  in  Section 
4  (without  having  to  observe  the  waiting  times  directly). 

In  most  queueing  simulations,  however,  both  the  waiting  time  sequence  W  =  :  n  >  1) 

and  queue-length  process  QT  are  directly  observable.  The  question  then  arises  as  to  whether 
to  estimate  the  steady-state  mean  waiting  time  u>  directly  or  to  base  the  estimation  on  the 
identity  u>  =  A ~lq.  As  discussed  in  Section  7,  the  criterion  for  selection  is  efficiency.  GLYNN 
and  WHITT  (1988c)  analyze  this  situation  by  assuming  a  joint  FCLT  for  the  .An’s  and  H^’s: 
There  exists  a  finite-valued  matrix  AL  such  that 


»1/3  “  tA"l>  “  Xj Wk  ~  tu,j  ^  ALBlt) 


OSH 


as  n  ->  co  in  .0(0,00),  where  B(  )  is  a  standard  two-dimensional  Brownian  motion.  To 
compute  the  direct  estimator  n-1  requires  observing  the  queueing  simulation  up 

to  the  time  OJ,  =  max{Z)*  :  1  <  k  <  n}.  Thus,  compared  with  the  product  estimator  • 

(ffm  QT(s)ds/D'n).  The  following  result  can  be  found  in  GLYNN  and  WHITT  (1988c). 

THEOREM  9:  If  (9.1)  holds  with  0  <  A,  w  <  00,  then 


as  n  — »  00. 


From  (9.1)  and  Theorem  9,  it  follows  that  both  estimators  have  the  same  asymptotic 
variance  parameter  (namely,  cr3 ,  the  (2,2)  element  of  A*LAL).  Thus,  there  is  no  difference, 
in  terms  of  (asymptotic)  efficiency,  betv  n  the  direct  estimator  for  w  and  the  product 
estimator  based  on  w  =  \~lq. 

However,  in  many  queueing  simulations,  there  is  additional  stochastic  structure  that 
can  be  exploited.  In  particular,  one  can  often  calculate  A  prior  to  running  the  simulation. 
This,  in  fact,  is  the  typical  situation  where  Qr  corresponds  to  the  total  number  of  customers 
in  an  open  queue;  A  is  then  just  the  external  arrival  rate  to  the  network.  Thus,  the  direct 
estimator  for  w  can  be  compared  with  the  indirect  estimator  A_1(/0D*  QT(»)ds/D'n).  To 
compare  efficiencies,  suppose: 

(9.2)  i)  {n_1[(j4„  -  nA-1)3  +  (££=1  Wh  -  nw)3] :  n  >  l}  is  uniformly  integrable 

ii)  E{Wj\Ui  =  u}  is  non-increasing  in  u  for  all  »,  j  >  1,  where  l/<  =  A<  -  1. 

The  second  condition  states  that  the  waiting  times  are  (roughly  speaking)  non-increasing 
in  the  inter-arrival  times.  This  is  a  condition  which  we  would  expect  to  hold  in  many 
queueing  systems. 

THEOREM  10:  If  (9.1)  and  (9.2)  hold,  then 


*1/a  Jo  Qr[d)d»  -  u>j  =>  <7,at(o,: 


where  a\  >  a\. 

See  GLYNN  and  WHITT  (1988c)  for  the  proof.  We  conclude  that  the  direct  estimator 
for  w  is  usually  more  efficient  than  the  indirect  estimator. 

A  similar  analysis  can  be  carried  out  for  the  problem  of  estimating  the  steady-state 
queue-length  parameter  q.  Note  that  the  identity  q  =  \u>  suggests  that  in  the  presence  of 


,WWV.rW^.'V'.rv>  ■WTW  WT>’^V  '>  "_M  '  V 


known  A,  the  estimation  of  q  and  w  are  equivalent  problems.  As  a  consequence,  we  can 
conclude  from  Theorem  10  that  the  estimator  An-1  Wk  is  usually  more  efficient  for 
calculating  q  than  the  more  obvious  estimator  /0D*  QT(d)da/D'n.  For  numerical  illustrations, 
see  LAW  (1975). 

10.  VRT  3:  DISCRETE-TIME  CONVERSION 

Suppose  that  Q  =  (Q(t)  :  t  >  0)  is  an  5-valued  non-explosive  CTMC  and  consider  the 
problem  of  estimating  a  =  g[u)  via  simulation,  where 


=  Ef 


*(»(*)) 


r(Q{a))da 


and  r:S-*  Rd,  g:  lRd  -*  JR.  From  the  discussion  in  Section  5,  it  is  evident  that  the  problem 
of  estimating  the  steady-state  parameter  a  =  A(Iimt_0O  t-1  f*  A(Q(*))<f»)  is  a  special  case  of  the 
above.  The  key  idea  is  to  let  r(q)  =  (/i(?),l)  and  let  y(zi,...,z<i, *d+i)  =  A(xi/*a+i,..-i*a/*a+i). 

The  obvious  approach  to  estimating  a  =  g(u)  is  to  recognize  that  it  is  a  special  case 
of  the  transient  simulation  problem  (4.1)  and  to  employ  the  methodology  described  there. 
Let  a(n)  be  the  estimator  obtained  by  generating  n  i.i.d.  copies  (Q< :  1  <  i  <  n)  of  the  CTMC 
up  to  the  stopping  time  A(r(y)).  It  turns  out  that  a  more  efficient  estimator  can  be  found 
by  simulating  only  the  embedded  discrete-time  Markov  chain  (DTMC).  This  is  the  basis 
of  the  principle  of  discrete-time  conversion. 

Let  Z  =  [Zn  :  n>  0)  be  the  embedded  DTMC  associated  with  the  CTMC  Q  i.e.  Zn  is 
the  n’th  distinct  state  visited  by  Q.  (If  p(«';«,e)  =  0  for  all  ctE{a),  aeS,  then  Zn  =  5„.)  Note 
that  if  Ef0A(T(v))  ||r(Q(«))||<fo  <  oo,  then 


UA(r(vn  1  'Mr1 

r(Q{a))da\Z\  =  £  r(Zn)/\[Zn) 


where  A(*)  is  the  exponential  holding  time  parameter  in  z  e  5.  As  a  consequence  of  (10.1), 
v  can  be  re-expressed  as 

(10.2)  V  =  E  T,  r{Zk)/X(Zk). 

fcsO 

Let  <»*(«)  be  the  transient  estimator  for  a  =  g(v)  which  exploits  the  discrete-time  repre¬ 
sentation  (10.2).  To  be  more  precise,  let  Z(l),...,Z(n)  be  i.i.d.  copies  of  the  sequence 
Z  (3(i)  =  (Zin  :  n  >  0))  generated  up  to  time  r(y).  Then 

Qd(n)  =  y(F'(n)) 


IP 


where  =  £?io)_1  r(2*n)/A(Z*n).  The  following  theorem  compares  the  variances  of  a (n) 
and  aa(n). 


THEOREM  11:  Suppose  that  E(f^T^  ||r(<3(«))||<ia)2  <  oo  and  g  is  differentiable  at  u. 
Then 

n1/2(a(n)  —  a)  =>  oN( 0, 1) 
n1/,J(ad(n)  -  a)  =>  <7<iN(0,  1) 

Where  a2  =  var[ Vg(»Y  f0A(r(v))  r(Q(s))^]  and  a\  =  var[£{Vy(M)‘  fQr(v)  r(Q(a))da|Z}]. 

The  proof  is  essentially  that  of  Theorem  1.  It  is  a  well  known  fact  that  if  EC  <  oo,  then 
var£{f|/}  <  varf  for  any  sub  tr-field  ?  defined  on  the  same  probability  space  as  (.  Thus, 
a\  <  <r2,  and  so  aa(n)  always  delivers  a  guaranteed  variance  reduction  as  compared  with 

a(n). 

To  compare  efficiencies,  note  that  simulating  Z  to  time  r(y)  takes  less  effort  than  gener¬ 
ating  Q  to  A(r(y)),  because  we  can  dispense  with  the  need  to  generate  exponential  variates. 
Thus,  ad[n)  is  a  double  winner,  in  the  sense  that  it  improves  upon  a(n)  both  from  a  variance 
and  computation  standpoint.  For  more  on  the  method  of  Bteady-state  discrete-time  conver¬ 
sion,  see  HORDIJK,  IGLEHART,  and  SCHASSBERGER  (1976)  and  FOX  and  GLYNN 
(1986). 

11.  4:  EXTENDED  CONDITIONAL  MONTE  CARLO 

e  basic  principle  used  in  discrete-time  conversion  is  that  we  can  reduce  the  variance 
of  an  estimator  via  conditioning.  This  general  idea  is  known,  in  the  simulation  literature, 
as  the  method  of  conditional  Monte  Carlo.  In  this  section,  we  consider  an  extension 
of  conditional  Monte  Carlo  in  which  the  different  observations  over  which  the  estimator 
averages  are  conditioned  on  different  r.v.’s.  Following  BRATLEY,  FOX,  and  SHRAGE 
(1987),  p.  71,  we  call  this  method  extended  conditional  Monte  Carlo.  In  general  we  can 
not  always  expect  to  obtain  a  variance  reduction  by  using  this  method.  Nevertheless,  there 
is  a  general  class  of  queueing  simulations  for  which  the  technique  does  work. 

Given  a  real-valued  Markov  chain  (MC)  X  =  {Xn  :  n  >  0)  and  f  :  JR  -»  R,  let  Kn  =  f{Xn). 
Suppose  that  (y„  :  n  >  0)  is  ergodic  in  the  sense  that  there  exists  finite  n  such  that 


?{n)  =>  H 
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as  n  -»  oo;  our  goal  here  is  to  efficiently  estimate  the  steady-state  mean  n.  An  important 
queueing  example  of  such  a  real-valued  MC  is  the  waiting  time  sequence  {Wn  :  n  >  0)  for 
the  GI/Gf  l/oo  queue,  which  obeys  the  recursion 

(11.1)  Wn+1  =  [WH  +  T,n+1)+, 

where  the  sequence  (i?n  :  n  >  1)  is  i.i.d.  and  independent  of  W0. 

Suppose  that  X  is  a  stochastically  increasing  MC  (SIMC)  i.e.  X  can  be  represented  in 
the  form  Xn+i  =  h(Xn,rtn+i)  where  h  is  non-decreasing  in  both  arguments  and  (»;*:«>  l) 
is  an  i.i.d.  sequence  independent  of  X0.  Note  that  (11.1)  is  a  special  case  of  a  SIMC. 
Our  next  result  shows  that  it  is  often  sufficient  to  assume  only  monotonicity  in  the  first 
co-ordinate  of  h  (for  the  proof,  see  the  Appendix). 

PROPOSITION  5:  Suppose  that  X  satisfies  a  recursion  of  the  form  Xn+1  -  h(Xn,rjn+l) 
where  h  is  non-decreasing  in  the  first  co-ordinate  and  (t7„  :  n  >  l)  is  an  i.i.d.  sequence 
independent  of  X0-  If  »?„  has  a  continuous  distribution,  then  X=X*  (=  denotes  equality 
in  distribution),  where  K+t  =  ^  (^i*»«+i)«  “  non-decreasing  in  both  co-ordinates,  and 
Wn  :  n  >  l)  is  an  i.i.d.  sequence  independent  of  X'0  with  v'n^Vn- 

The  idea  behind  extended  conditional  Monte  Carlo  is  to  replace  the  estimator  F(n)  by 


F«(n)  =  iX>{/(**)l**-x>. 


Note  that  Fe(n)  is  a  sample  mean  of  the  r.v.’s  /«pfo)».-.>/«(A»»-i)  where 

/«(*)  =  £</(A'i)l*o  =  *>• 

We  assume  that  both  F(n)  and  F,(n)  satisfy  CLT’s:  There  exists  a,  <rt  such  that 

n1^a(F(n)  —  n)  =>  trN(0, 1) 

(11.2) 

nl/3(Fe(n)  -  A»)  =>  <r*JV(0, 1) 
as  n  -*  oo.  Furthermore,  assume  that 

{«{(F(f.)  -  a*)3  +  (F.(»)  -  #*)3J :  »  >  1} 

(11.3) 

is  uniformly  integrable 

The  following  theorem  shows  that  extended  conditional  Monte  Carlo  reduces  variance  if  / 
is  monotone  and  X  is  a  SIMC. 
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THEOREM  12:  Assume  (11.2)  and  (11.3).  If  X  is  a  SIMC  and  /  is  monotone,  then 

For  the  proof,  see  the  Appendix.  Because  of  the  stochastic  monotonicity  that  holds 
in  many  queueing  networks,  we  expect  that  extended  conditional  Monte  Carlo  should  also 
prove  useful  in  situations  where  Q  is  vector-valued.  For  a  result  closely  related  to  Theorem 
12,  see  ROSS  (1988).  The  efficiency  of  the  method  depends  on  the  amount  of  variance 
reduction  and  the  ease  of  computing  /«. 

12.  VRT  5:  COMMON  RANDOM  NUMBERS 

The  idea  underlying  common  random  numbers  (CRN’s)  is  that  to  compare  two  queue¬ 
ing  systems,  it  is  “fairer”  to  drive  the  two  systems  using  the  same  stream  of  random 
numbers.  If  CRN’s  are  used  and  one  system  is  treated  “unfairly”  by  receiving  a  particu¬ 
larly  unlucky  string  of  random  inputs,  then  both  are.  Of  course,  in  making  this  statement, 
we  are  implicitly  assuming  that  both  systems  respond  in  a  similar  manner  to  the  driving 
inputs.  This  suggests  that  the  method  of  CRN’s  depends  on  monotonicity. 

Assume  that  XUX2  a re  real-valued  SIMC’s  (X,  =  (lf,„  :  n  >  0))  and  let  fltfa  be  non¬ 
decreasing  functions.  Let  Yin  =  fi(Xin)  and  suppose  that  (Y<»  :  n  >  0)  (i  =  1,2)  is  ergodic  in 
the  sense  that 

F<(n)  =>  m 

as  n  -+  oo.  Assuming  that  h*  represents  the  “performance”  of  system  »,  the  simulator  is 
often  interested  in  estimating  the  difference  a  =  h\  -  w  in  performance  between  the  two 
queueing  systems.  The  naive  estimation  approach  would  involve  simulating  Xi  and  X2 
independently  and  estimating  a  via  a/(n)  =  ?i(n)  -  F3(r»).  Assuming  that  there  exists  finite 
01,02  such  that 

(12.1)  n1/3(F<(n)  -/*%)=»’  V<JV( 0, 1), 

it  is  easily  seen  that 

n1^a(a/(n)  -  a)  =>  <rN[ 0, 1) 

where  <r3  =  <r\  +  <r|. 

Suppose  now  that  the  method  of  CRN’s  is  used.  Specifically,  suppose  that  Xi,„+i  = 
MXi,n,fjn+1)  and  X2,n+1  =  h2(X2,n,i'n-n)  where  (rj„  :  n  >  1)  is  i.i.d.  independent  of  Xli0  and 


(i/n  n  >  1)  is  i-i-d.  independent  of  X2,0-  However,  we  now  require  that: 


{X\fi,X2,<urln>vn  :  n  >  1}  is  an 

(12.2) 

associated  family  of  r.v.’s 

A  standard  approach  to  obtaining  nontrivial  association  between  the  r?„’s  and  t/n’s  is  to 
generate  both  rjn  and  vn  by  inversion  from  the  same  r.v.  Un: 

n«  =  F-^Vn) 
vn  =  G~l(Un), 


where  F'1,  G~l  are  the  inverse  distribution  functions  for  rt,u  respectively. 

However,  in  many  settings,  one  can  obtain  the  required  association  without  explicit  use 
of  inversion.  For  example,  suppose  that  P{vnedx }  =  F{dx/82)  and  P{rjncdx}  =  F(dx/61),  where 
9U  $2  are  positive.  Thus,  the  distributions  of  rjn  and  vn  differ  by  a  change  of  “scale” .  Suppose 
that  the  sequence  (?;«  :  n  >  1)  is  generated  arbitrarily  (e.g.  by  acceptance-rejection).  If  the 
vn's  are  derived  from  the  »7n’s  via 

—  (^a/^l)»7ni 

then  (12.2)  follows.  Thus,  it  is  not  necessary,  in  this  change-of-scale  setting,  to  use  inversion 
in  order  to  apply  CRN’s.  In  a  similar  fashion,  explicit  inversion  can  be  avoided  when  the 
distribution  of  vn  differs  from  that  of  rjn  by  a  change  of  “location" .  Changes  of  scale  and 
location  arise  frequently  when  making  comparison  of  stochastic  systems. 

Let  ac(n)  =  Fi(n) - P3(n)  be  the  CRN  estimator,  in  which  the  input  sequences  (rin  :n>  l) 
and  (i/„  :  n  >  l)  are  dependent  as  in  (12.2).  Assume  that 


(12.3) 
and  that 

(12.4) 


n1/3(ac(n)  —  a)  =*•  <reN(0, 1) 


(nl(Pl(n)  -  mi)3  +  (F2(n)  -  n)3) :  n  >  l)  is  a 
uniformly  integrable  family  of  r.v.’s. 


The  following  theorem  is  easily  proved  by  the  methods  of  HEIDELBERGER  and  IGLE- 
HART  (1979). 


THEOREM  13:  Suppose  XltX2  are  SIMC’s  with  fx,f2  non-decreasing.  If  (12.1)-(12.4) 
are  in  force,  then  a3  <  a]. 
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Because  of  the  stochastic  monotonicity  present  in  many  queueing  systems,  this  suggests 
that  a  variance  reduction  will  often  be  achieved  by  using  CRN’s  to  compare  the  perfor¬ 
mance  of  two  queueing  systems.  An  efficiency  improvement  will  result  if  the  variance  is 
reduced  and  if  the  computational  effort  required  to  associate  the  »jn’8  and  v„’s  is  not  too 
large.  Note  that  if  inversion  was  used  to  generate  the  f?n’s  and  v„’s  independently,  one  can 
continue  to  use  inversion  in  order  to  associate  the  »>„’s  and  t/n’s,  provided  that  the  same 
stream  of  common  uniforms  is  used  for  both  families  of  r.v.’s.  In  this  case,  the  effort  re¬ 
quired  to  associate  the  sequences  is  no  greater  than  the  effort  required  in  the  independent 
case. 

An  idea  similar  to  CRN’s  can  be  used  to  estimate  m  =  Mi  (see  (12.1)).  Henceforth,  for 
the  remainder  of  this  section,  we  drop  the  subscript  from  the  first  SEMC.  Note  that  now 
we  are  dealing  with  a  single  SIMC  so  that  no  comparison  is  being  made.  Suppose  that  the 
r?„’s  defining  X  are  generated  by  inversion,  so  that  t]n  =  F~1[Un)  when  Ul,U2,...  are  i.i.d. 
uniform  r.v.’s.  Observing  that  the  antithetic  r.v.’s  1  - UU1  -  U2t...  are  also  i.i.d.  uniforms, 
we  can  define  the  antithetic  chain  X^  via  X'0  =  X0  and  F_x(l  -  £/B+1)).  Let 

=  f(X'n)  and  consider  the  antithetic  estimator 

*<.(«)  =  ±(P(n)  +  P'(n)) 

To  compare  the  variance  of  this  estimator  with  that  of  F(  ),  note  that  a„(n)  corresponds 
to  simulating  the  dynamics  of  the  SIMC  for  a  total  of  2n  transitions.  Thus,  a„(n)  must  be 
compared  to  P(2n).  From  (12.1),  the  relative  variance  quantity  is  <t2/2.  Suppose  that 

(12.5)  n1/3(aa(n)  -  y)  =>  <raN(0, 1) 

as  n  -♦  oo.  The  following  result  can  be  proved  by  the  methods  of  HEIDELBERGER  and 
IGLEHART  (1979). 

THEOREM  14:  Suppose  X  is  a  SIMC  and  f  is  monotone.  If  (12.1),  (12.4),  (12.5)  are  in 
force,  then  a\  <  aJ/2. 

Again,  due  to  the  stochastic  monotonicity  present  in  many  networks,  we  expect  that 
the  method  of  antitheses  will  often  provide  a  variance  reduction  in  the  queueing  context. 
A  disadvantage  of  antithetics,  as  opposed  to  CRN’s,  is  that  it  appears  inversion  must  be 
used  to  generate  the  corresponding  input  streams  of  r.v.’s. 
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13.  VRT  6:  SIMULATING  THE  ERROR  IN  HEAVY  TRAFFIC 

As  indicated  in  Section  6,  the  amount  of  computer  time  t  required  to  simulate  the 
steady-state  mean  waiting  time  EW  must  satisfy  t  >  (l  —  p)-2  as  p  /  1,  in  order  to  estimate 
EW  to  a  fixed  degree  of  relative  precision  c.  Thus,  the  problem  of  estimating  the  quantity 
EW  gets  harder,  in  a  relative  sense,  as  p  /  1.  MINH  and  SORLI  (1983)  developed  an 
elegant  method  for  avoiding  this  difficulty  when  EW  is  the  steady-state  mean  waiting  time 
in  the  GI/G/ifoo  queue. 

Let  Vo,  Ui  be  the  service  time  of  the  zero’th  customer  and  the  first  interarrival  time, 
respectively,  in  the  GI/G/ l/oo  queue.  Then,  MARSHALL  (1960)  showed  that  if  EV0  < 
EUU  E(V2  +  Uf)  <  oo,  then 


EW  =  - 


EZ 3  _  EP_ 
2  EZ  El 


where  Z  =  V0-Ui  and  /  is  the  length  of  the  first  idle  period  in  a  GI/G/ l/oo  queue  in  which 
the  zero’th  customer  arrives  at  t  =  0.  Thus,  estimating  a  =  EW  is  equivalent  to  estimating 
a  =  g{EI2,EI)  where  G(x,  y)  =  -EZ^/lEZ  -  x/y;  the  methods  of  Section  4  can  then  be  used 
to  estimate  a.  This  was  the  basic  idea  of  MINH  and  SORLI  (1983). 

Noting  that  EW - EZ2 /2 EZ  as  p  /  1  is  the  classical  diffusion  approximation  for  EW, 

we  recognize  that  the  approach  taken  here  is  to  use  simulation  only  to  estimate  the  error 
term  in  the  diffusion  approximation  to  EW.  Extensions  of  this  idea  to  the  GI/G/s/ oo  queue 
appear  in  MINH  (1987). 

14.  VRT  7:  IMPORTANCE  SAMPLING 

Consider  the  problem  of  estimating  the  parameter  a  =  P{W  >  w},  where  W  is  the 
steady-state  mean  waiting  time  of  a  customer  in  a  GI/G  /l/oo  queue  in  which  pel.  If  to  is 
large,  the  event  {W  >  to)  occurs  rarely,  so  that  conventional  simulation  is  inefficient.  The 
idea  underlying  importance  sampling  is  to  alter  the  dynamics  of  the  queueing  system  via  a 
“change  of  measure”  so  that  the  event,  under  the  altered  dynamics,  occurs  more  frequently 
(i.e.  the  “important”  event  occurs  more  often) .  Of  course,  in  order  to  account  for  the  new 
dynamics,  the  estimator  must  be  suitably  modified. 

To  illustrate,  suppose  that  P{W  >  w}  is  estimated  by  using  the  fact  that  W  =  m*x(Sk 
k  >  0),  where  (Sn  :  n  >  0)  is  the  random  walk  with  negative  drift  and  So  =  0  associated  with 
the  GI/G/ l/oo  queue.  Then,  a  can  be  re-expressed  as  a  =  P{T  <  oo)  where  T  =  inf{n  >  0  : 
S„  >  u»}.  Suppose  the  i.i.d.  summands  of  the  random  walk  have  common  distribution  F, 


where  F  has  a  moment  generating  function  <p  which  converges  in  a  neighborhood  of  the 
origin.  Since  (S*  :  n  >  0)  has  negative  drift,  it  follows  that  <p'{0)  <  0.  Thus,  if  there  exists  a 
positive  root  to  <p{8)  =  l,  it  must  be  unique  (by  the  convexity  of  <p).  Suppose  that  such  a 
unique  root  S’  exists. 

Rather  than  generating  the  summands  of  the  random  walk  from  the  distribution  F, 
consider  generating  the  summands  from  the  distribution 

F[dx)  =  t>'aF{dx). 

Then,  it  is  straightforward  to  verify  that 

a  =  P{T  <  00}  =  S{e-e'Sr  ,T  <  00} 

where  E{)  corresponds  to  the  expectation  in  which  the  summands  have  distribution  F. 
Since  T  <00  a.s.  under  P  (the  random  walk  has  positive  drift  under  £), 

(14.1)  a  =  Etxp(-6‘Sr). 

SIEGMUND  (1976)  and  ASMUSSEN  (1985)  show  that  simulation  based  on  (14.1)  is  much 
more  efficient  than  simulation  under  F.  The  key  is  that  F  turns  the  drift  positive,  so  that 
the  event  (T  <  00}  becomes  certain.  The  factor  «xp(-fl*Sr)  is  introduced  to  compensate  for 
the  new  measure  F. 

This  area  is  currently  active,  from  a  research  viewpoint.  For  an  application  of  impor¬ 
tance  sampling  to  networks  of  queues,  see  PAREKH  and  WALRAND  (1986). 

15.  GRADIENT  ESTIMATION  FOR  QUEUEING  SYSTEMS 

Consider  a  queueing  system  in  which  the  dynamics  depend  on  a  parameter  8eRm. 
Specifically,  let  Q  =  (Q{t)  :  t  >  0)  be  a  GSMP  in  which  the  clock-setting  distributions 
and  routing  probabilities  depend  on  8  i.e.  under  parameter  8,  the  clocks  are  driven  by 
distributions  F(8,  and  customers  axe  routed  via  p(0, «';*,«).  Clearly,  the  queueing 

parameter  a  to  be  estimated  by  the  simulation  algorithm  then  depends  on  8 ,  i.e.  a  =  a(8).  A 
current  area  of  vigorous  research  concerns  the  efficient  estimation  of  Va(tf).  The  interest  in 
the  problem  derives  from  the  fact  that  gradient  evaluations  form  an  important  ingredient 
of  most  efficient  optimization  routines. 

Two  basic  methods  have  been  suggested  for  attacking  this  problem.  The  first  tech¬ 
nique,  called  infinitesimal  perturbation  analysis  (IPA),  typically  assumes  that  the  routing 
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probabilities  are  independent  of  0,  so  that  all  0-dependence  is  incorporated  via  the  distribu¬ 
tions  F(0,  Also,  the  parameter  a  =  a(0)  usually  takes  the  form  a(0)  =  g(EtY)  (£#(•) 

denotes  expectation  under  the  ^-dynamics),  where 

(15.1)  E,Y  =  E,  /  h(Q(s))ds, 

N  is  an  integer-valued  r.v.,  h  :  S  -*  Rd,  g  :  Rd  -*  R.  Suppose,  for  simplicity,  that  8eR. 
The  idea  is  to  observe  that  if  f(9,  ja'.e',*, e)  is  a  scale  family  in  9  (i.e.  F[9,  ■; s',  tf, «, e)  = 
F{  /9;s',e',»,e)),  the  clock-setting  variates  associated  with  the  0-system  can  be  generated 
as  6Ri(a',e',Bte),  9Ra(a',e',a,e),.. .  where  the  sequence  (Rn(a',e',a,e)  :  n  >  1)  is  that  associated 
with  /’(•;»',  «',*,«).  Let  Q(0)  =  (Q(0,t)  :  t  >  0)  be  the  GSMP  in  which  the  clock  readings  are 
set  by  the  sequence  (9Rn(a',e',»,e)  :  n  >  1).  The  key  observation  is  that  for  Y  of  the  form 

(15.1) ,  EeY  =  £T(0)  where 

,1  (9,N(0))  N(d)-1 

(15.2)  Y(9)=  h(Q(9,a))da  =  £  *(*»(*))**(&(*).  Cfc(0)) 

J°  k=0 

Given  0O,  the  r.v.’s  (l\T(0),5fc(0) :  1  <  Jb  <  N{9))  are  typically  constant  in  some  neighborhood  of 
0O.  Therefore,  it  follows  from  (15.2)  that  in  computing  the  path-by-path  derivative  Y'(9)  of 
y(0),  we  need  only  to  compute  the  derivative  of  t*(St(0o),C*(0))  with  respect  to  0.  In  many 
queueing  systems,  this  calculation  can  easily  be  done.  This  allows  us  to  calculate  a'(0) 
as  a'(0)  =  Vg(EY(9)YEY'{9)  (note  that  a'(0)  =  g(EY(9))  is  a  transient  parameter  of  the  type 
described  in  Section  4,  where  g(yi,va)  =  ^?(»i)*ya  and  y(0)  =  (V'(0),y'(0))),  provided  that 

(15.3)  4 EY(9)  m  EY'(9). 

do 

A  similar  method  works  for  location  families  (i.e.  F(9,  \$',e',a,e)  =  F(  -9-,a' ,e' ,*,«))  and  even 
more  generally. 

While  the  above  method  works  efficiently  on  certain  classes  of  queueing  systems,  it 
is  not  valid  universally.  In  particular,  there  is  a  large  class  of  non-pathological  queueing 
networks  for  which  (15.3)  is  false;  see  HEIDELBERGER  et  al  (1988)  for  details.  For 
further  background  on  IPA,  see  ZAZANIS  and  SURI  (1988). 

The  second  technique  for  evaluating  gradients  uses  the  method  of  likelihood  ratios. 
Here,  the  0-dependence  may  be  reflected  in  both  the  clock-setting  distributions  F(9,-,  a', *,  e) 
and  routing  probabilities  p(0, «';«,«).  However,  the  following  density  conditions  need  to  be 
in  force: 


(15.4)  i)  F(9,  dx ; s,  e)  =  f(6 ,  *; e\  «,  e)  F(0O.  <**; «, «) 

when  /(tf,  «,e)  is  positive  and  continuously  differentiable  in  6. 

ii)  p(9, 9';s,e)  is  either  identically  zero  or  identically  positive  as  a  function 
of  0,  for  each  triplet 

Given  a  transient  parameter  of  the  form  (4.1)  (note  that  Y  need  not  be  of  the  form 
(15.1)  here),  the  density  assumption  (15.4)  permits  us  to  estimate  a(0)  =  g(EeY)  by  using 
the  likelihood  ratio  identity 

(15.5)  EeY  =  EetYL(9) 

where  L(9)  =  dPt/dPe<>  is  the  Radon-Nikodym  derivative  of  the  distribution  P,  with  respect 
to  Pto  (existence  of  dP»/dPto  is  assured  by  (15.4)).  Hence,  if  0  is  scalar,  this  suggests  that 
c'(0o)  =  Vg(E9oY),E9oYL,(0 0),  provided  that 

(15.6)  JjE»oYL(*)  =  E9oYL'{90). 

The  derivative  interchange  (15.6)  is  typically  valid  for  queueing  systems  of  arbitrary  struc¬ 
ture.  As  in  the  case  of  IPA,  it  is  easily  seen  that  Vg(EeaY)tEeoYL'{90)  is  itself  a  transient 
parameter  of  the  form  (4.1),  so  that  the  methodology  of  Section  4  may  be  used  to  develop 
an  efficient  estimator  for  a'(0o).  For  more  details  on  this  approach,  including  a  formula  for 
L'(tfo),  see  GLYNN  (1987). 


APPENDIX 

Proof  of  Proposition  2:  Let  N{t)  =  max{n  >  0  :  A (n)  <  t }.  Since  n_1A(n)  -♦  a.s.,  it 

follows  that  N(t)/t  -*  \/m  a.s.  Abo,  it  b  apparent  that 

Hi  r *  ,"<*>  II 

(A.l)  ±  /  MQW)^-7  E^fc)t*(5fc,Cfc)  <  ||A(5^(,))|| t*(5w(t),Cjv(())/t 

II  *  Jo  1  k=0  I 

.  W(‘) 

(A. 2)  -  E  **(**.  <?*)  -  1  < 

c  Jb  =  0 

Since  JV(t)  -»  oo  a.s.,  evidently  iV(t) ~ 1  ||A(Sfc)||t*(SfclC*)  -♦  n3  a.s.,  which  implies  that 
11  l|c* (S^(t) ,  C'/sr(«))/JV'(*)  -» 0  a.s.  But  N(t)/t  -*  l//*a  a.s.,  so  the  right-hand  side  (RHS)  of 

(A.l)  converges  to  0  a.s.  Similarly,  we  see  that  the  RHS  of  (A.2)  goes  to  0  a.s.  Inequalities 
(A.l)  and  (A.2)  then  yields 

m.3)  *  r  ww)*  -  sfily ’■»■&'*)/»<?)  _  „  a.s. 

J  t/o  E^f(Sh,Ch)/N(t) 

The  hypothesized  strong  laws  then  can  be  applied  to  the  RHS  of  (A.3)  to  obtain  the  result. 

Proof  of  Proposition  3:  By  Theorem  4,  p.  84,  of  CHUNG  (1967),  E\\Y(y)\\3  <  oo 
for  all  y  if  £||Y(x)||3  <  oo.  It  b  well  known  that  n  =  .Ev  J0a(t(w))  A(Q(*))da/.EI,A(f-(y))  b  in¬ 
dependent  of  y.  Write  Y(y)  =  (Y'(y),T(y)),  where  T(y)  =  A(r(y)).  Then,  an  easy  calcula¬ 
tion  shows  that  Vg(EY{y)YC[y)Vg(EY{y))  =  var„  Z(y)/(£rr(y))3  where  Z(y)  =  VJfc(M)‘[K'(y)  - 
(Zvr(y)/Zvr(y))r(y)].  By  Theorem  1,  p.  99,  of  CHUNG  (1967),  var yZ(y)/EvT(y)  b  inde¬ 
pendent  of  y.  Thus,  the  proof  b  complete  if  we  show  that  E„Pi(y)/EvT[y)  is  independent 
of  y.  But  Epx(y)/ET{y)  =  -Ex(*M*)/£^7,M*),  where  v  b  the  stationary  distribution  of  the 
embedded  chain  of  Q  and  A  b  the  generator  of  Q;  thb  latter  ratio  b  independent  of  y. 

Proof  of  Proposition  5:  Put  G( )  =  P{t?„  <  }  and  F(x,  •)  =  P{A(x,nn)  <  }•  Let  F-1(x,y) 
be  the  inverse  function  defined  by  F-l(x,y)  =  sup{*  :  F(x,x)  <  y).  Set  X'0  =  X0  and  define 
( X 'n  :  n  >  1)  via  the  recursion  X'n+l  =  h'[X'nlt)n+i)  where  h'{x,y)  =  F-1(x,G(y)).  Note  that  X'  b 
a  MC  with  transition  dbtribution 

P{K+ 1  <  v\K  =  *}  =  P{h'(x,nn+i)  <  v) 

=  P{F~l(x,U)  <  y}; 

G( ij)  =  U  is  a  uniform  r.v.  since  G  b  continuous.  But  P{F_1(x, U)  <  y)  =  F(x,y)  because 
F_1(x,  )  b  an  inverse  function.  Thus,  it  follows  that  X=X'.  Monotonicity  of  V  in  the  second 


co-ordinate  is  easy.  For  the  first  co-ordinate,  use  the  fact  that  F(-,y)  is  non-increasing  to 
conclude  that  F-1(,y)  is  non-decreasing. 


Proof  of  Theorem  12:  Because  of  the  uniform  integrability,  it  suffices  to  show  that 

ft  n+1 

var  H  /«(-**)  ^ var  ]C  /(**)• 

k=l  k=3 

i.e. 

£  «“•  f‘Wk)  +  2  £  co MX,)) 

fc=l  W 

n 

<  ^v"/(^+i)  +  2EC0V(^+»).  /(*>+!)) 

*=i  ***' 

Note  that  var/«pffc)  =  var-E{/(Xfc+i)Pf*}  <  var  /(-Yj,+1)  by  the  principle  of  conditional 
Monte  Carlo.  To  complete  the  proof,  we  therefore  need  to  show  that  Efe(Xk)fe(X,)  < 
Ef(Xk+1)f(X,+1)  for  A:  <  j.  By  the  Markov  property,  Ef(Xk+1)f() ii+t)  =  Ec,-k(Xk)  where 
ci(x)  =  E{f[Xi)f[Xi+i)\X0  =  *}.  Notice  that  if  X0  =  x,  Xn  is  a  non-decreasing  function  of 
the  independent  r.v.’s  nii •••»'»«•  Since  /  is  monotone,  it  follows  that  }{X i)  and  f(Xl+1)  are 
associated  r.v.’s,  conditional  on  X0  -  x.  Thus,  a  standard  inequality  (see  BARLOW  and 
PROSCHAN  (1975),  p.  31)  yields 

ct(x)  >  EMX,) \X0  =  *)  •  E{f(Xl+1)\X0  =  *} 

=  /.(*)  ■  E{MXt)\X0  =  x). 

If  we  integrate  the  above  inequality  with  respect  to  P{Xkedx},  we  obtain  Ef(Xk+i)f(X,+i) 
>  EMxk)MXi). 
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